Peter Firmstone wrote:
Actually a problem I have, is I don't have access to the resources required to performance test some of these implementations, as they would be stressed under a massive cluster situation.

The other assumption I make is something that performs well today, might not tomorrow, due to the multi core revolution we have on our hands. This may turn out to be a flawed assumption.

That is why I believe we need analysis, as well as measurement.

Unfortunately, my experience of very large systems is limited. The largest system I've studied in detail had up to 106 single-threaded processors. Most of my practical experience is with 64 or fewer single-threaded processors.

However, I believe the sort of thinking that allows some testing on smaller systems to project what will happen on systems with a few dozen may be applicable to even bigger ones.

One technique is to use the whole of a small system to drive something that will only be one component in a real application. That is what I am doing with TaskManager - driving a single TaskManager instance as hard as I can. Currently, I'm using a single thread to generate the traffic, but if necessary I may use more than one thread.

Under those conditions, we can find out things like how the time in synchronized code relates to the length of the queue. That should allow some projection of the transaction rates the implementation will support on large systems.

Going unsynchronized for a short piece of potentially parallel work may cost more in extra inter-processor cache misses than it gains. In general, the biggest performance gain for running code in parallel comes from going from one thread at a time to two, something we can test on a relatively small system.

I don't automatically assume atomics are really efficient. I've seen the logic analyzer traces from a compare-and-swap fight among 64 processors all trying to atomically increment the same integer by one at about the same time. Not a pretty sight. Nothing got blocked, but progress was very slow. A lot depends on whether the atomic is directly supported in hardware, or is built on top of compare-and-swap.

Patricia




Reply via email to