Patricia Shanahan wrote:
On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
...
When I write code of this nature, attempting to remove all contention, I
try
to list every "step" that changes the "view" of the world, and think about how that "view" can be made atomic by using explicit ordering of statements
rather than synchronized{} blocks.  ...

I would like to discuss how to approach performance improvement, and especially scaling improvement. We seem to have different philosophies, and I'm interested in understanding other people's approaches to programming.

I try to first find the really big wins, which are almost always data structure and algorithm changes. That should result in code that is efficient in terms of total CPU time and memory. During that part of the process, I prefer to keep the concurrency design as simple as possible, which in Java often means using synchronization at a coarse level, such as synchronization on a TaskManager instance.

I don't think we are too far different. The big wins are the ones to go for. What I've learned over the years, debugging Java performance issues in Jini applications, and elsewhere, is that "synchronized", while the most "correct" choice in many cases, is also the "slowest" form of concurrency control.

In a "service" application in particular, it can, in many cases, be the case that the total CPU time needed to perform the actual work, is a smaller fraction of the time that "synchronized" injects as latency in the execution path. I think you understand this issue, but I want to illustrate it to make sure.

File I/O and network I/O latency can create similar circumstances. If you look at this with some numbers (I put ms but any magnitude will show the same behavior) such as the following:

2ms to enter server (Security/Permission stuff)
1ms CPU to arrive at synchronized
3ms through synchronized lock
1ms to return result

Then if there are 30 such threads running through the server, eventually, all of them will be standing in line at the synchronized (monitor entry) because they can get through all the other stuff in short order. As the total number of threads increases, the "synchronized" section time must be multiplied by the number of threads, and so with 10 threads, it becomes 30ms of time, because each thread must wait it's turn. Thus, 30ms becomes the minimum latency
through that part of the system, instead of the 3ms there would be with 1 
thread.

So, eliminating synchronized as a "global" encounter is what I always consider first.

In my earlier email, I talked about using ConcurrentHashMap instead of HashSet specifically because of this issue. ConcurrentHashMap will distribute locks amongst groups of key values so that there is not near the order of magnitude of "standing in line" that there would be if you used HashSet and synchronized
access to it.

Once that is done, I review the performance. If it is fast and scalable I stop there. If that is not the case, I look for the bottlenecks, and consider whether parallelism, or some other strategy, will best improve them. Any increase in concurrency complication has to be justified by a demonstrated improvement in performance.

My big picture objective is to find the simplest implementation that meets the performance requirements (or cannot reasonably be made significantly faster, if the requirement is just "make it fast"). I value simplicity in concurrency design over simplicity in data structures or algorithms for two reasons:

1. Making the code more parallel does nothing to reduce the total resources is uses. Better algorithms, on the other hand, can significantly reduce total resources.

2. Reasoning about data structures and algorithms is generally easier than reasoning about concurrency.

It sounds as though you are advocating almost the opposite approach - aim for maximum concurrency from the start, without analysis or measurement to see what it gains, or even having a baseline implementation for comparison. Is that accurate? If so, could you explain the thinking and objectives behind your approach? Or maybe I'm misunderstanding, and you can clarify a bit?

I think we are thinking alike, but just have some slightly different order of attention to the details.

Because I've been affected by so many concurrency issues, it is one of the top things that I look at. I do consider, first, how common the execution path is before spending inordinate amounts of time there.

I've found ConcurrentHashMap to be a good solution to a number of things, because it does so well at removing concurrency issues.

Thanks,

Patricia


Gregg

Reply via email to