@moerm: Your algorithm uses Euclid's formula, which (1) does not exhaustively enumerate all non-primitive Pythagorean triples (for example, it'll skip 9^2+12^2=15^2) and (2) does not enumerate them in the same order as the original algorithm. To get that right, you have to jump through a few additional hoops.
I threw [this D code](https://run.dlang.io/gist/rbehrends/1911c4c30143b015a15032e69808000c?compiler=ldc&args=-O2) together a couple of days ago as a proof of concept, which is a bit more complex, but matches the output of the original implementation exactly. It should be pretty trivial to port to Nim for anybody who is interested. It will still beat the pants off the original, simply because that's what happens if you compare an O(n log n) time complexity algorithm to an O(n^3) one. @cblake: Yes, the original blog post is about the cost of abstractions (or rather, being able to avoid that). However, once you're dealing with double or triple nested loops, algorithmic optimization compared to compiler optimization begins to dominate, because the constant factor that compilers can give you quickly becomes secondary for O(n^k) time complexity with k >= 2. (I could write Ruby or Python code that would beat the C++ implementation for not very big values of _n_ ), and the composability of iterators/ranges is not adequate for the most interesting algorithmic optimizations.