It should be offset ..< size.

Basically the parallelChunks proc create N chunks and will give you the 
chunkStart and chunkLength. Then you can spawn N serial proc that each take a 
chunkStart + chunkLength as argument.

So what you did is good, except the sync should be outside the loop:
    
    
    parallelChunks(1, linePtr.len(), chunkOffset, chunkSize):
        spawn worker(f, linePtr.at(chunkOffset), chunkSize)
      sync()
    
    
    Run

and you just need to compile with --threads:on

For reference I am using similar templates in Laser (but with OpenMP backend 
not Nim threadpools):

  * Chunking template: 
[https://github.com/numforge/laser/blob/d1e6ae61/laser/openmp.nim#L240-L318](https://github.com/numforge/laser/blob/d1e6ae61/laser/openmp.nim#L240-L318)
  * Usage:
    * Parallel memcpy: 
[https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L42-L66](https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L42-L66)
    * Parallel zero-ing: 
[https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L130-L154](https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L130-L154)



Here is a minimum working example of setting chunks in parallel with a random 
value.
    
    
    import threadpool, random, cpuinfo
    
    template parallelChunks(start, stop: int, chunkOffset, chunkSize: 
untyped{ident}, body: untyped): untyped =
      let
        numIters = (stop - start)
        numChunks = countProcessors()
        baseChunkSize = numIters div numChunks
        remainder = numIters mod numChunks
      
      var chunkOffset {.inject.}, chunkSize {.inject.}: Natural
      
      for threadID in 0 ..< numChunks:
        if threadID < remainder:
          chunkOffset = start + (baseChunkSize + 1) * threadID
          chunkSize = baseChunkSize + 1
        else:
          chunkOffset = start + baseChunkSize * threadID + remainder
          chunkSize = baseChunkSize
        
        block: body
    
    
    proc setValue(chunk: ptr int, size: int, value: int) =
      # Allow array indexing on pointers
      let chunk = cast[ptr UncheckedArray[int]](chunk)
      for i in 0 ..< size:
        chunk[i] = value
    
    proc main() =
      
      # We will randomly set the values of a sized 100 sequence
      var s = newSeq[int](100)
      echo "Before: ", s
      
      # Seed a rng
      randomize(1234)
      
      parallelChunks(0, s.len - 1, chunkOffset, chunkSize):
        # Take a pointer to the offset of the seq
        let chunkStart = s[chunkOffset].addr
        
        # Get a random value for each thread/chunk of the seq
        let val = rand(100)
        
        # Parallel set the values
        spawn setValue(chunkStart, chunkSize, val)
      
      # Wait for all threads to finish
      sync()
      
      # Check the result
      echo "After: ", s
    
    main()
    
    
    Run

Output on my machine (36 cores): 
    
    
    Before: @[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    After: @[0, 0, 0, 37, 37, 37, 98, 98, 98, 18, 18, 18, 83, 83, 83, 15, 15, 
15, 91, 91, 91, 48, 48, 48, 86, 86, 86, 10, 10, 10, 58, 58, 58, 78, 78, 78, 44, 
44, 44, 3, 3, 3, 22, 22, 22, 17, 17, 17, 83, 83, 83, 42, 42, 42, 97, 97, 97, 
24, 24, 24, 20, 20, 20, 55, 55, 55, 6, 6, 6, 98, 98, 98, 71, 71, 71, 66, 66, 
66, 72, 72, 72, 12, 12, 88, 88, 83, 83, 55, 55, 74, 74, 40, 40, 21, 21, 46, 46, 
95, 95, 0]
    
    
    Run

You can see that all threads have either 3 or 2 items, a naive chunking instead 
would give:

100/36 = 2.77, using 2 would give 2 tasks for the first 35 threads and 100 - 
2*35 = 30 tasks for the last one.

Reply via email to