This specifically seems to relate to the notion of contention. Single threads
using singly referenced resources can achieve much better throughput than pools
of threads traversing contended resources. But, it doesn't always scale. In
newer JVMs and 64bit OSes, single threads can go quite far now. But, we didn't
use to have that luxury. In distributed networking systems, there has to be a
way for all threads of execution to make progress or you get into
distributed/circular wait problems. Local only threads of execution are easier
to prove as "making progress" than are "distributed events" in capped (max
number of threads) thread pool scenarios. We need to watch out for DOS attacks
on open ended thread pools, but also watch out for DOS attacks that happen
because of closed thread pool thread count exhaustion.
Gregg Wonderly
Peter Firmstone wrote:
Food for thought?
http://developers.slashdot.org/story/10/07/27/1925209/Java-IO-Faster-Than-NIO?from=rss&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot%2Fto+%28%28Title%29Slashdot+%28rdf%29%29
Patricia Shanahan wrote:
Gregg Wonderly wrote:
Patricia Shanahan wrote:
On 7/21/2010 12:58 PM, Gregg Wonderly wrote:
...
When I write code of this nature, attempting to remove all
contention, I
try
to list every "step" that changes the "view" of the world, and
think about
how that "view" can be made atomic by using explicit ordering of
statements
rather than synchronized{} blocks. ...
I would like to discuss how to approach performance improvement, and
especially scaling improvement. We seem to have different
philosophies, and I'm interested in understanding other people's
approaches to programming.
I try to first find the really big wins, which are almost always
data structure and algorithm changes. That should result in code
that is efficient in terms of total CPU time and memory. During that
part of the process, I prefer to keep the concurrency design as
simple as possible, which in Java often means using synchronization
at a coarse level, such as synchronization on a TaskManager instance.
I don't think we are too far different. The big wins are the ones to
go for. What I've learned over the years, debugging Java performance
issues in Jini applications, and elsewhere, is that "synchronized",
while the most "correct" choice in many cases, is also the "slowest"
form of concurrency control.
Yes, I think it is merely a matter of emphasis. One problem I've seen
is a tendency to accept the times without questioning whether they
need to be as long as they are, and focus too much too soon on
concurrency, before making it fast, so I tend to resist that by
keeping the concurrency simple until the data structures and
algorithms are right.
In a "service" application in particular, it can, in many cases, be
the case that the total CPU time needed to perform the actual work,
is a smaller fraction of the time that "synchronized" injects as
latency in the execution path. I think you understand this issue,
but I want to illustrate it to make sure.
File I/O and network I/O latency can create similar circumstances.
If you look at this with some numbers (I put ms but any magnitude
will show the same behavior) such as the following:
2ms to enter server (Security/Permission stuff)
1ms CPU to arrive at synchronized
3ms through synchronized lock
1ms to return result
Then if there are 30 such threads running through the server,
eventually, all of them will be standing in line at the synchronized
(monitor entry) because they can get through all the other stuff in
short order. As the total number of threads increases, the
"synchronized" section time must be multiplied by the number of
threads, and so with 10 threads, it becomes 30ms of time, because
each thread must wait it's turn. Thus, 30ms becomes the minimum latency
through that part of the system, instead of the 3ms there would be
with 1 thread.
Suppose a better choice of algorithm and data structures could reduce
the time with the synchronized lock from 3ms to 3us. Wouldn't it be
better to do that first? With 10 threads, eliminating synchronization
may, at best, produce a 10x performance improvement. Picking the right
data structures and algorithms can often do better than that.
An ArrayList is not an efficient representation of a queue, especially
under load conditions that may cause a significant backlog.
I feel that doing some optimizations, including a lot of the
concurrency changes, can create complication and commitment that
becomes a barrier to algorithm and data structure changes. On the
other hand, picking good data structures and algorithms does nothing
to bar improving concurrency later.
So, eliminating synchronized as a "global" encounter is what I always
consider first.
I, on the other hand, prefer to minimize the work being done first,
and then decide whether there is enough left for synchronization to be
a potential bottleneck.
Note that there is a finite limit to the number of hardware threads
that can be supported efficiently, at least with current technology,
while maintaining the Java memory model. Really large clusters run as
multiple separate SMP computers, each with one or more JVMs of its
own, with message passing rather than shared memory communication
between the SMP computers.
Patricia