It should be offset ..< size.
Basically the parallelChunks proc create N chunks and will give you the
chunkStart and chunkLength. Then you can spawn N serial proc that each take a
chunkStart + chunkLength as argument.
So what you did is good, except the sync should be outside the loop:
parallelChunks(1, linePtr.len(), chunkOffset, chunkSize):
spawn worker(f, linePtr.at(chunkOffset), chunkSize)
sync()
Run
and you just need to compile with --threads:on
For reference I am using similar templates in Laser (but with OpenMP backend
not Nim threadpools):
* Chunking template:
[https://github.com/numforge/laser/blob/d1e6ae61/laser/openmp.nim#L240-L318](https://github.com/numforge/laser/blob/d1e6ae61/laser/openmp.nim#L240-L318)
* Usage:
* Parallel memcpy:
[https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L42-L66](https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L42-L66)
* Parallel zero-ing:
[https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L130-L154](https://github.com/numforge/laser/blob/d1e6ae61/laser/tensor/initialization.nim#L130-L154)
Here is a minimum working example of setting chunks in parallel with a random
value.
import threadpool, random, cpuinfo
template parallelChunks(start, stop: int, chunkOffset, chunkSize:
untyped{ident}, body: untyped): untyped =
let
numIters = (stop - start)
numChunks = countProcessors()
baseChunkSize = numIters div numChunks
remainder = numIters mod numChunks
var chunkOffset {.inject.}, chunkSize {.inject.}: Natural
for threadID in 0 ..< numChunks:
if threadID < remainder:
chunkOffset = start + (baseChunkSize + 1) * threadID
chunkSize = baseChunkSize + 1
else:
chunkOffset = start + baseChunkSize * threadID + remainder
chunkSize = baseChunkSize
block: body
proc setValue(chunk: ptr int, size: int, value: int) =
# Allow array indexing on pointers
let chunk = cast[ptr UncheckedArray[int]](chunk)
for i in 0 ..< size:
chunk[i] = value
proc main() =
# We will randomly set the values of a sized 100 sequence
var s = newSeq[int](100)
echo "Before: ", s
# Seed a rng
randomize(1234)
parallelChunks(0, s.len - 1, chunkOffset, chunkSize):
# Take a pointer to the offset of the seq
let chunkStart = s[chunkOffset].addr
# Get a random value for each thread/chunk of the seq
let val = rand(100)
# Parallel set the values
spawn setValue(chunkStart, chunkSize, val)
# Wait for all threads to finish
sync()
# Check the result
echo "After: ", s
main()
Run
Output on my machine (36 cores):
Before: @[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
After: @[0, 0, 0, 37, 37, 37, 98, 98, 98, 18, 18, 18, 83, 83, 83, 15, 15,
15, 91, 91, 91, 48, 48, 48, 86, 86, 86, 10, 10, 10, 58, 58, 58, 78, 78, 78, 44,
44, 44, 3, 3, 3, 22, 22, 22, 17, 17, 17, 83, 83, 83, 42, 42, 42, 97, 97, 97,
24, 24, 24, 20, 20, 20, 55, 55, 55, 6, 6, 6, 98, 98, 98, 71, 71, 71, 66, 66,
66, 72, 72, 72, 12, 12, 88, 88, 83, 83, 55, 55, 74, 74, 40, 40, 21, 21, 46, 46,
95, 95, 0]
Run
You can see that all threads have either 3 or 2 items, a naive chunking instead
would give:
100/36 = 2.77, using 2 would give 2 tasks for the first 35 threads and 100 -
2*35 = 30 tasks for the last one.