OK... going back to the hypothesis in the OP > The plateau is seemingly defined by the number of cores or, more correctly, > by the number of supported threads.
This suggests that the benchmark is CPU-bound, which is supported by your more recent observation "100% load for a single one" Also, you mentioned running MacOS with two threads per core, which implies Intel's hyperthreading. Depending on the workload, CPU-bound processes sharing a hyperthreaded core see a speedup of 0-30%, as opposed to running on separate cores which can give a speedup of 100%. (Back when I searched for large primes, HT gave a 25% speed boost.) So with 6 cores, 2 HT per core, I would expect a max parallel boost of 6 * (1x +0.30x) = 7.8x - and your test is only giving half that. -y
