For what it's worth, and for completeness if Windows portability even matters in this case (as @markebbert mentioned, these science things are often one time deals), this works but is 6x slower (405 sec aka 6min 45sec) than the `popen`/`mSlices` variant: import strutils, osproc, streams, cligen/mslice proc main() = let p = startProcess("gzip -dc < big.vcf.gz", options={poEvalCommand}) let outp = p.outputStream var line = newStringOfCap(4096).TaintedString while outp.readLine(line): if line.startsWith('#'): continue var i = 0 let msLine = toMSlice(line) for col in msLine.mSlices('\t'): if i >= 9: for fmt in col.mSlices(':'): # do something with $fmt break i.inc main() Run
That `streams` code needs some better line-buffering love, though { Or `osproc` could use `File` instead of `Stream`}. `system/io.nim:readLine(File,..)` used to be a similarly slow almost identical implementation. But, the clear speed winners so far are either the `mopen` variant decompressed if you have the space/RAM or, if you run on Unix, the `lines(popen())`-`mSlices` variant (re-encoded with `pzstd` if you need to process the same file many times). { If `nimble install` doesn't work for you, in a pinch, you could always `git clone https://github.com/c-blake/cligen`, copy `cligen/mslice.nim` into the same dir as your program and adjust the `import` to its unqualified name. I get that Araq doesn't want to rely upon libc `memchr` being fast or support different compile-time/run-time versions, but 4X slower is a pretty big hit. That's why I tossed `mslice` into `cligen` so others might benefit. I'm not even sure `mSlices` is as fast as possible and as I mentioned various overheads clearly depend on string/substring lengths. }