Re: Nim vs. Python & Groovy (string splitting): Why is string splitting so slow in Nim?

jyapayne Wed, 21 Aug 2019 14:45:13 -0700

@markebbert, yes you are right! That should be
    
    
    if data[last] == '\l':
      buffer.add data[pos+1 ..< pos+bufSize]
    else:
      buffer.add data[pos ..< pos+bufSize]
    pos += bufSize
    
    
    Run


Which will account for the buffer increase. So the code now for the `lines` 
iterator is (I also changed the buffer size to avoid reallocations):
    
    
    iterator lines*(s: GzFileStream, cap = 100000000): TaintedString {.tags: 
[ReadIOEffect].} =
      ## An iterator across GzFileStream that takes a buffer capacity
      var
        bufSize = 24576
        data = TaintedString(newStringOfCap(cap))
        buffer = TaintedString(newStringOfCap(bufSize))
      
      var
        pos = 0
        last = 0
      
      while not s.atEnd():
        var success = readStr(s, cap, data)
        if not success: break
        
        pos = 0
        last = 0
        
        while pos < data.len:
          let m = c_memchr(addr data.string[pos], '\L'.ord, bufSize)
          
          if m != nil:
            # \l found: Could be our own or the one by fgets, in any case, 
we're done
            last = cast[ByteAddress](m) - cast[ByteAddress](addr data.string[0])
            
            if last > 0 or data.string[last] == '\l':
              if buffer.len > 0:
                yield buffer & data[pos ..< last]
                buffer.setLen(0)
              else:
                yield data[pos ..< last]
              
              pos = last + 1
            else:
              break
          else:
            if data[last] == '\l':
              buffer.add data[pos+1 ..< pos+bufSize]
            else:
              buffer.add data[pos ..< pos+bufSize]
            pos += bufSize
        
        if last < data.len:
          # there is still data left over and we
          # need to keep it for the next iteration
          if data[last] == '\l':
            buffer.add data[last+1 ..< data.len]
          else:
            buffer.add data[last ..< data.len]
    
    
    Run

I do recommend you go with one of @cblakes memslice solutions, though. They 
will be much faster.

Re: Nim vs. Python & Groovy (string splitting): Why is string splitting so slow in Nim?

Reply via email to