On 16/05/2013 15:50, David Chase wrote:
:
Parallel performance is a little harder to reason about on big x86 boxes (both
Intel and AMD), so I am leaving the threshold high. Dave Dice thought this
might be an artifact of cores being put into a power-saving mode and being slow
to wake (the particular benchmark I wrote would have been pessimal for this,
since it alternated between serial and parallel phases). The eventual speedups
were often impressive (6x-12x) but it was unclear how many hardware threads
(out of the 32-64 available) I was using to obtain this. Yes, I need to plug
this into JMH for fine-tuning. I'm using the system fork-join pool because
that initially seemed like the good-citizen thing to do (balance CRC/Adler
needs against those of anyone else who might be doing work) but I am starting
to wonder if it would make more sense to establish a small private pool with a
bounded number of threads, so that I don't need to worry about being a good
citizen so much. It occurs to me, late in the game, that using big-ish units
of work is another, different way to be a bad citizen. (I would prefer to get
this checked in if it represents a net improvement, and then work on the tuning
afterwards.)
The current proposal doesn't change the API at this time but I wonder if
you have considered adding parallelUpdate methods to complement the
serial methods?
-Alan.