Davide,
I don't know. But I do know that processor/thread binding (if that is what
you mean by "pin")
is what I meant. :) But a q&d implementation does not seem to make much
difference other than for 8 and 16 threads, where it helps a bit.
Running some more, I noticed that there are plenty of other overheads and the
'avg. time' doesn't get anywhere near stable until the number of iterations is
in the 1000s (I used 100 before).
iterations 16 threads 32 threads PyPy-2.1
100 127.43 146.57 9.63
200 77.59 86.37 7.80
500 46.92 49.12 6.82
1000 36.51 33.80 6.29
2000 32.18 28.69 6.40
The numbers are closer together, and HT now helps (note that the "slowdown"
for 2000 iterations for 2.1 is not significant; I should run this multiple
times and average, but this is just for fun). It is obvious, though, that
overheads are larger for STM atm, and are therefore important for longer.
The differences at larger number of iterations are much less for smaller
numbers of threads (and zero for 1 thread). Intuitively that makes sense. It
also says that 16 threads can give a 11x speedup if there's enough work to do.
Best regards,
Wim
--
[email protected] -- +1 (510) 486 6411 -- www.lavrijsen.net
_______________________________________________
pypy-dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-dev