Food for thought?

http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO?from=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot%2Fto+%28%28Title%29Slashdot+%28rdf%29%29

Patricia Shanahan wrote:
Gregg Wonderly wrote:
Patricia Shanahan wrote:
On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
...
When I write code of this nature, attempting to remove all contention, I
try
to list every "step" that changes the "view" of the world, and think about how that "view" can be made atomic by using explicit ordering of statements
rather than synchronized{} blocks.  ...

I would like to discuss how to approach performance improvement, and especially scaling improvement. We seem to have different philosophies, and I'm interested in understanding other people's approaches to programming.

I try to first find the really big wins, which are almost always data structure and algorithm changes. That should result in code that is efficient in terms of total CPU time and memory. During that part of the process, I prefer to keep the concurrency design as simple as possible, which in Java often means using synchronization at a coarse level, such as synchronization on a TaskManager instance.

I don't think we are too far different. The big wins are the ones to go for. What I've learned over the years, debugging Java performance issues in Jini applications, and elsewhere, is that "synchronized", while the most "correct" choice in many cases, is also the "slowest" form of concurrency control.

Yes, I think it is merely a matter of emphasis. One problem I've seen is a tendency to accept the times without questioning whether they need to be as long as they are, and focus too much too soon on concurrency, before making it fast, so I tend to resist that by keeping the concurrency simple until the data structures and algorithms are right.


In a "service" application in particular, it can, in many cases, be the case that the total CPU time needed to perform the actual work, is a smaller fraction of the time that "synchronized" injects as latency in the execution path. I think you understand this issue, but I want to illustrate it to make sure.

File I/O and network I/O latency can create similar circumstances. If you look at this with some numbers (I put ms but any magnitude will show the same behavior) such as the following:

2ms to enter server (Security/Permission stuff)
1ms CPU to arrive at synchronized
3ms through synchronized lock
1ms to return result

Then if there are 30 such threads running through the server, eventually, all of them will be standing in line at the synchronized (monitor entry) because they can get through all the other stuff in short order. As the total number of threads increases, the "synchronized" section time must be multiplied by the number of threads, and so with 10 threads, it becomes 30ms of time, because each thread must wait it's turn. Thus, 30ms becomes the minimum latency through that part of the system, instead of the 3ms there would be with 1 thread.

Suppose a better choice of algorithm and data structures could reduce the time with the synchronized lock from 3ms to 3us. Wouldn't it be better to do that first? With 10 threads, eliminating synchronization may, at best, produce a 10x performance improvement. Picking the right data structures and algorithms can often do better than that.

An ArrayList is not an efficient representation of a queue, especially under load conditions that may cause a significant backlog.

I feel that doing some optimizations, including a lot of the concurrency changes, can create complication and commitment that becomes a barrier to algorithm and data structure changes. On the other hand, picking good data structures and algorithms does nothing to bar improving concurrency later.

So, eliminating synchronized as a "global" encounter is what I always consider first.

I, on the other hand, prefer to minimize the work being done first, and then decide whether there is enough left for synchronization to be a potential bottleneck.

Note that there is a finite limit to the number of hardware threads that can be supported efficiently, at least with current technology, while maintaining the Java memory model. Really large clusters run as multiple separate SMP computers, each with one or more JVMs of its own, with message passing rather than shared memory communication between the SMP computers.

Patricia



Reply via email to