So, one other thing that is _probably_ obvious but bears mentioning just in case it didn't occur to @alfrednewman - the number of addressable/seekable bytes in a file is usually maintained by any modern filesystem on any modern OS as cheaply accessed metadata. So, if what you really need is not an exact count but a "reasonable bound" that could be violated once in a while then you may be able to _really_ accelerate your processing.
As one example text corpus that we all have access to, the current Nim git repo's `lib/nim/**.nim` files have an average length of 33 bytes and a standard deviation of about 22 bytes as assessed by this little program: import os, memfiles when isMainModule: var sumSq = 0 for slc in memSlices(inp): inc(counter) sumSq += slc.size * slc.size echo "num lines: ", counter let mean = float(inp.size) / float(counter) echo "mean length: ", mean let meanSq = float(sumSq) / float(counter) echo "var(length): ", meanSq - mean*mean You could probably reasonably bound the average number of bytes per line as, say (average + 4*stdev) which in this case is about 33+22*4 =~ 121 bytes..maybe round up to 128. Then you could do something like: import os var reasonableUpperBoundOnLineCount = int(float(getFileInfo(myPath).size) / float(128)) If you use that bound to allocate something then you are unlikely to over allocate memory by more than about 4X which isn't usually considered "that bad" in this kind of problem space. Depending on what you are doing you can tune that parameter and you might need to be prepared in your code to "spill" past a very, very rare 4 standard deviations tail event. This optimization will beat the pants off any even AVX512 deal that iterates over all the file bytes at least for this aspect of the calculation. It basically eliminates a whole pass over the input data in a case that is made common by construction. Since you have seemed pretty dead set on an exact calculation in other posts, a small elaboration upon the "embedded assumptions" in this optimization may be warranted. All that is really relied upon is that some initial sample of files can predict the distribution of line lengths "well enough" to estimate some threshold (that "128" divisor") that has "tunably rare" spill overs where they are rare enough to not cause much slowdown in whatever ultimate calculation you are actually doing which you have not been clear about. Another idea along these lines, if, say, the files are processed over and over again, is to avoid re-computing all those `memchr()` s by writing a little proc/program to maintain an off-to-the-side file of byte indexes to the beginnings of lines. The idea here would be that you have two sets of files, your actual input files and some paired file "foo.idx" with foo.idx containing just a bunch of binary ints in the native format of the CPU that are either byte offsets or line lengths effectively caching the answer of the `memchr`. If you had such index files then when you want to know how many lines a file is you can `getFileInfo` on the ".idx" file and know immediately. You could be careful and check modification times on the .idx and the original data file and that sort of thing, too. Why, you could even add a "smarter" `memSlices` that checked for such a file and skipped almost all its work and an API call `numSlices` that skipped all the work if the ".idx" file is up-to-date according to time stamps. Basically, except for actually "just computing file statistics", it seems highly unlikely to me that you should really be optimizing the heck out of newline counting in and of itself beyond what `memSlices/memchr()` already do.