Davide,
I don't know. But I do know that processor/thread binding (if that is what you mean by "pin")
is what I meant. :) But a q&d implementation does not seem to make much difference other than for 8 and 16 threads, where it helps a bit. Running some more, I noticed that there are plenty of other overheads and the 'avg. time' doesn't get anywhere near stable until the number of iterations is in the 1000s (I used 100 before). iterations 16 threads 32 threads PyPy-2.1 100 127.43 146.57 9.63 200 77.59 86.37 7.80 500 46.92 49.12 6.82 1000 36.51 33.80 6.29 2000 32.18 28.69 6.40 The numbers are closer together, and HT now helps (note that the "slowdown" for 2000 iterations for 2.1 is not significant; I should run this multiple times and average, but this is just for fun). It is obvious, though, that overheads are larger for STM atm, and are therefore important for longer. The differences at larger number of iterations are much less for smaller numbers of threads (and zero for 1 thread). Intuitively that makes sense. It also says that 16 threads can give a 11x speedup if there's enough work to do. Best regards, Wim -- wlavrij...@lbl.gov -- +1 (510) 486 6411 -- www.lavrijsen.net _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev