Re: TaskManager progress

Peter Firmstone Tue, 27 Jul 2010 14:27:17 -0700

Food for thought?

http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO?from=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot%2Fto+%28%28Title%29Slashdot+%28rdf%29%29


Patricia Shanahan wrote:

Gregg Wonderly wrote:
Patricia Shanahan wrote:
On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
...
When I write code of this nature, attempting to remove allcontention, I
try
to list every "step" that changes the "view" of the world, andthink abouthow that "view" can be made atomic by using explicit ordering ofstatements
rather than synchronized{} blocks.  ...
I would like to discuss how to approach performance improvement, andespecially scaling improvement. We seem to have differentphilosophies, and I'm interested in understanding other people'sapproaches to programming.
I try to first find the really big wins, which are almost alwaysdata structure and algorithm changes. That should result in codethat is efficient in terms of total CPU time and memory. During thatpart of the process, I prefer to keep the concurrency design assimple as possible, which in Java often means using synchronizationat a coarse level, such as synchronization on a TaskManager instance.
I don't think we are too far different. The big wins are the ones togo for. What I've learned over the years, debugging Java performanceissues in Jini applications, and elsewhere, is that "synchronized",while the most "correct" choice in many cases, is also the "slowest"form of concurrency control.
Yes, I think it is merely a matter of emphasis. One problem I've seenis a tendency to accept the times without questioning whether theyneed to be as long as they are, and focus too much too soon onconcurrency, before making it fast, so I tend to resist that bykeeping the concurrency simple until the data structures andalgorithms are right.
In a "service" application in particular, it can, in many cases, bethe case that the total CPU time needed to perform the actual work,is a smaller fraction of the time that "synchronized" injects aslatency in the execution path. I think you understand this issue,but I want to illustrate it to make sure.
File I/O and network I/O latency can create similar circumstances.If you look at this with some numbers (I put ms but any magnitudewill show the same behavior) such as the following:
2ms to enter server (Security/Permission stuff)
1ms CPU to arrive at synchronized
3ms through synchronized lock
1ms to return result
Then if there are 30 such threads running through the server,eventually, all of them will be standing in line at the synchronized(monitor entry) because they can get through all the other stuff inshort order. As the total number of threads increases, the"synchronized" section time must be multiplied by the number ofthreads, and so with 10 threads, it becomes 30ms of time, becauseeach thread must wait it's turn. Thus, 30ms becomes the minimum latencythrough that part of the system, instead of the 3ms there would bewith 1 thread.
Suppose a better choice of algorithm and data structures could reducethe time with the synchronized lock from 3ms to 3us. Wouldn't it bebetter to do that first? With 10 threads, eliminating synchronizationmay, at best, produce a 10x performance improvement. Picking the rightdata structures and algorithms can often do better than that.
An ArrayList is not an efficient representation of a queue, especiallyunder load conditions that may cause a significant backlog.
I feel that doing some optimizations, including a lot of theconcurrency changes, can create complication and commitment thatbecomes a barrier to algorithm and data structure changes. On theother hand, picking good data structures and algorithms does nothingto bar improving concurrency later.
So, eliminating synchronized as a "global" encounter is what I alwaysconsider first.
I, on the other hand, prefer to minimize the work being done first,and then decide whether there is enough left for synchronization to bea potential bottleneck.
Note that there is a finite limit to the number of hardware threadsthat can be supported efficiently, at least with current technology,while maintaining the Java memory model. Really large clusters run asmultiple separate SMP computers, each with one or more JVMs of itsown, with message passing rather than shared memory communicationbetween the SMP computers.
Patricia

Re: TaskManager progress

Reply via email to