Why is Rust faster than Nim in this CSV parsing example?

cblake Tue, 06 Dec 2022 06:00:25 -0800

To flesh out @federico3's mention, using the new in nim-devel `openArray[char]` 
variants in `std/parseutils` (thanks @ElegantBeef!), you could do this (which 
is _almost_ but not quite as fast as @Vindaar's version using my modules, and 
faster than the Rust on my test box):
    
    
    import std/parseutils, std/memfiles as mf # parse CSV v3
    
    template toOpenArrayChar(s: MemSlice): untyped =
      toOpenArray(cast[ptr UncheckedArray[char]](s.data), 0, s.size - 1)
    
    var total = 0
    let f = mf.open("./nim.csv")
    for line in memSlices(f):
      let r = MemFile(mem: line.data, size: line.size)
      for c in r.memSlices(','):
        var x: int
        if parseInt(c.toOpenArrayChar, x) == c.size:
          total += x
    
    if total != 2999999 * 3000000 div 2: echo "mismatch"
    
    
    Run


Possibly useful if you are allergic to dependencies beyond stdlib, want to 
pedagogically introduce some CSV thing before teaching packaging/build issues, 
or other reasons, BUT the shenanigans with the `MemFile` type are poor style.

If you are in the kind of caffeine fueled rage optimization (CFRO) mood 
sometimes common in systems programming, 
[c2tsvs](https://github.com/c-blake/nio/blob/main/utils/c2tsvs.nim) using 
`parsecsv` or the faster 
[c2tsv](https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim) are an 
interesting design to consider. You basically coerce escaped-quoted-CSV into a 
split-parseable stream and at the same time get to go dual core with `popen` 
(even [on 
Windows](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/popen-wpopen),
 I think). Your pipe bandwidth just needs to beat the parsing. But as per the 
primary theme of that NIO package where those tools are or of @Vindaar's 
[HDF5](https://github.com/Vindaar/nimhdf5) or others, you should bulk convert 
from text to binary and then stay there. These tools should be understood as 
motivated by optimizing that bulk conversion.

Why is Rust faster than Nim in this CSV parsing example?

Reply via email to