Peter Firmstone wrote:
Actually a problem I have, is I don't have access to the resources
required to performance test some of these implementations, as they
would be stressed under a massive cluster situation.
The other assumption I make is something that performs well today, might
not tomorrow, due to the multi core revolution we have on our hands.
This may turn out to be a flawed assumption.
That is why I believe we need analysis, as well as measurement.
Unfortunately, my experience of very large systems is limited. The
largest system I've studied in detail had up to 106 single-threaded
processors. Most of my practical experience is with 64 or fewer
single-threaded processors.
However, I believe the sort of thinking that allows some testing on
smaller systems to project what will happen on systems with a few dozen
may be applicable to even bigger ones.
One technique is to use the whole of a small system to drive something
that will only be one component in a real application. That is what I am
doing with TaskManager - driving a single TaskManager instance as hard
as I can. Currently, I'm using a single thread to generate the traffic,
but if necessary I may use more than one thread.
Under those conditions, we can find out things like how the time in
synchronized code relates to the length of the queue. That should allow
some projection of the transaction rates the implementation will support
on large systems.
Going unsynchronized for a short piece of potentially parallel work may
cost more in extra inter-processor cache misses than it gains. In
general, the biggest performance gain for running code in parallel comes
from going from one thread at a time to two, something we can test on a
relatively small system.
I don't automatically assume atomics are really efficient. I've seen the
logic analyzer traces from a compare-and-swap fight among 64 processors
all trying to atomically increment the same integer by one at about the
same time. Not a pretty sight. Nothing got blocked, but progress was
very slow. A lot depends on whether the atomic is directly supported in
hardware, or is built on top of compare-and-swap.
Patricia