On 16/05/2013 15:50, David Chase wrote:
:

Parallel performance is a little harder to reason about on big x86 boxes (both 
Intel and AMD), so I am leaving the threshold high.  Dave Dice thought this 
might be an artifact of cores being put into a power-saving mode and being slow 
to wake (the particular benchmark I wrote would have been pessimal for this, 
since it alternated between serial and parallel phases).  The eventual speedups 
were often impressive (6x-12x) but it was unclear how many hardware threads 
(out of the 32-64 available) I was using to obtain this.  Yes, I need to plug 
this into JMH for fine-tuning.  I'm using the system fork-join pool because 
that initially seemed like the good-citizen thing to do (balance CRC/Adler 
needs against those of anyone else who might be doing work) but I am starting 
to wonder if it would make more sense to establish a small private pool with a 
bounded number of threads, so that I don't need to worry about being a good 
citizen so much.  It occurs to me, late in the game, that using big-ish units 
of work is another, different way to be a bad citizen.  (I would prefer to get 
this checked in if it represents a net improvement, and then work on the tuning 
afterwards.)

I'm sure Doug or Brian or David Holmes will have opinions on this point but I would think using the common pool is right. If parallel sort, CRC32 and other specific usages each created their own thread pool then I could imagine a lot of thread pools hanging around and competing. Plus there are cases like EE where no-parallelism might be the right answer and one wouldn't want to have to configure each usage.

In any case, this looks really good work. One thing that might be worth checking is startup/warm-up. I have a vague memory of this being a concern in the past with Adler32, Sherman might remember the details.

-Alan.



Reply via email to