Yeah..Depending on what he's doing, same-file dynamic estimation might also work. Good point, @jlp765.
On my system the 5.4 MB of `/usr/share/nim/**.nim` gets counted in about 4 milliseconds - over 1.3 GB/sec, probably faster than all but the most powerhouse nvme/disk array IO. This is why I suspect @alfrednewman might be re-calculating things instead of saving the answer either in RAM or as files. I'm sure a pre-pass calculating the number of lines can avoid certain complexities. However, once you start doing assembly hijinks that are not even portable through a given CPU family (e.g., using SSE, AVX2, AVX512, ...) performance becomes very deployment sensitive. Meanwhile, eliminating the entire pre-pass by merging it with per-line allocations/whatever costs complexity, too, but yields portable performance gains. If it's really ineliminable then have at it, asm-wise, I guess. I just suspect it's off-track.