Note that with a shared table, accesses will need to be serialized to ensure 
soundness and data consistency:

  * This means that threads will compete for cache accesses and might lead to 
cache thrashing/cache ping-pong/false sharing/cache invalidation. If your task 
per thread is short (less than micro-second ~ 10000 cycles) and your table is 
small, you will likely experience a huge slowdown anywhere from 2x to 10x as 
threads compete for cache.
  * Nim shared tables are really really bad and should be removed from the 
standard library. It's only useful for proof-of-concepts but underneath it's 
just a shared heap allocated table + lock. Unfortunately writing fast and 
correct concurrent data structures is an extremely **hard** problem.
  * processing strings in parallel brings memory-management woes as strings are 
heap allocated and reclaiming memory that was allocated from another thread is 
about 5x slower at best than reclaiming memory from your own thread.
  * Furthermore if all the created strings ends up owned by the shared table 
thread you are in a producer/consumer scenario with many threads that depletes 
their extra memory cache and 1 thread that ends up hoarding lots of extra 
memory when it collect the strings, so both of their cache ends up useless. 
Unfortunately few memory allocators deal with that effectively (Microsoft's 
Mimalloc and Snmalloc), and it's the worst case behaviour for the default 
allocators in Linux, MacOS and Windows.



Lastly this part is suspect in terms of extra allocation 
`paramStr(1).open().readAll().splitLines().ytz()` as mentioned by @leorize

So I suggest you use @cblake code single-threaded.

Reply via email to