Well, I don't really have a blog. So, this is what you get. ;-) Someone else 
can, though. Ideally, just give me credit by linking back to here. Or if you 
can make any of that `cligen/mslice.*split*` faster then a PR is welcome.

As a slight update, storing `big.vcf` in `/tmp` (a tmpfs aka `/dev/shm` RAM 
filesystem) and doing profile-guided optimization with gcc (via a little 
`nim-pgo vsn2 ./vsn2` script I have around), I get about 15% improvement down 
to 0.85 s. Linux `perf` tells me about 58% of that time is just in 
`__memchr_avx2` which is already hand assembly tuned.

That is probably about as fast as you can get (at least on my Skylake 
generation CPU) in any language. You might be able to eek out another 10..20% 
or more if you hand-rolled everything in assembly and did just the right 
prefetches/branch predictor gaming. Or you might not. Parsing at 1.85 GB/s 
isn't so bad. For reference, on that machine single-threaded RAM bandwidth is 
about 32 GB/s, and as mentioned Zstd can spit out that compressed file about 3x 
faster { though that is multi-threaded over 4+ cores..So, it's actually 
slightly slower on a single-core basis }.

Anyway, I doubt there is any real Nim problem here. A better use of time might 
have been to just wait for @markebbert to respond to the very first line of the 
very first response.

Reply via email to