On 16/05/2013 15:50, David Chase wrote:
:

Parallel performance is a little harder to reason about on big x86 boxes (both 
Intel and AMD), so I am leaving the threshold high.  Dave Dice thought this 
might be an artifact of cores being put into a power-saving mode and being slow 
to wake (the particular benchmark I wrote would have been pessimal for this, 
since it alternated between serial and parallel phases).  The eventual speedups 
were often impressive (6x-12x) but it was unclear how many hardware threads 
(out of the 32-64 available) I was using to obtain this.  Yes, I need to plug 
this into JMH for fine-tuning.  I'm using the system fork-join pool because 
that initially seemed like the good-citizen thing to do (balance CRC/Adler 
needs against those of anyone else who might be doing work) but I am starting 
to wonder if it would make more sense to establish a small private pool with a 
bounded number of threads, so that I don't need to worry about being a good 
citizen so much.  It occurs to me, late in the game, that using big-ish units 
of work is another, different way to be a bad citizen.  (I would prefer to get 
this checked in if it represents a net improvement, and then work on the tuning 
afterwards.)

The current proposal doesn't change the API at this time but I wonder if you have considered adding parallelUpdate methods to complement the serial methods?

-Alan.


Reply via email to