On Mar 24, 2008, at 2:44 PM, Dmitri Trembovetski wrote:

 Hi Clemens.

Clemens Eisserer wrote:
Hello,
  Since most applications do render from one thread (either the
  Event Queue like Swing apps, or some kind of dedicated rendering
  thread like games), the lock is indeed very fast, given
  biased locking and such.

  I would suggest not trying to optimize things - especially tricky
  ones which involve locking - until you have
  identified with some kind of tool that there's a problem.
I did some benchmarking to find out the best design for my new
pipeline, and these are the results I got:
10mio solid 1x1 rect, VolatileImage, server-compiler, Core2Duo-2ghz,
Intel-945GM, Linux:
200ms no locking, no native call
650ms locking only
850ms native call, no locking
1350ms as currently implemented in X11Renderer


BTW, Clemens, when reporting microbenchmark scores, it would be a big help if you could use J2DBench to generate such numbers. It takes care of running enough iterations to produce a statistically useful number, and J2DAnalyzer helps visualize the numbers in a consistent format (to compare relative numbers such as these).

 Did you mean OGLRenderer? The X11Renderer doesn't use single
 thread rendering model and thus doesn't need render queue.

 Note that on X11 the render queue lock is doubled as the lock against
 all X11 access - for both awt and 2d. We must lock around it because
 we all use the same display, and X11 is not multi-threaded (at
 least in the way we use it).
This means that the lock is likely to be promoted to a heavyweight lock,
 which is why it is expensive.


That may have been the case in JDK 5, where we used the "synchronized" keyword to manage synchronization of access to X11 in X11Renderer and other AWT classes. But in JDK 6 you'll recall that we reimplemented this synchronization to use ReentrantLock instead, most importantly because it offers better performance under contention (as is often the case with the "AWT lock"). (Yes, "built-in" synchronization has largely caught up since then, due to biased locking and other optimizations, but ReentrantLock is still a nice lightweight solution.)

For more on ReentrantLock, this article from Brian Goetz is still the best summary that I've ever come across:
http://www.ibm.com/developerworks/java/library/j-jtp10264/

Oh, and hooray, I just came across the bug report that I wrote up when moving to ReentrantLock in JDK 6, which has lots of details on the matter:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6317330

Thanks,
Chris


 So the problem with having separate render buffers per thread is that
 at some point you will have to synchronize on SunToolkit.awtLock()
 anyway.

I did rendering only from a single thread (however not the EDT), in
this simple pipeline-overhead test the locking itself is almost as
expensive as the "real" work (=native call), and far more expensive
than an "empty" JNI call.
However this was on a dual-core machine, on my single-core amd64
machine locking has much less influence. As far as I know biased
locking is only implemented for monitors.
Xorg ran on my 2nd core, and kept it with locking only 40% busy,
without locking about 80%.
However I have to admit there are probably much more important things
to do than playing with things like that ;)

 You probably can explore ways to improve the current design,
 which only allows a single rendering queue. For example,
 we had discussed the possibility of extending the STR design
 to allow a rendering thread per destination. But again,
 on unix it will bump against the need to sync around X11 access.

 You can also play with having a render buffer per thread as
 you suggest, but your rendering thread will have to sync for
 reading from each render buffer - presumably on the same lock
 as the thread used to put stuff into that buffer.
 All doable, but risky and hard to assess the benefits before
 you have a working implementation. Just commenting out
 locks gives wrong impression, since the resulting code
 becomes incorrect and thus the benchmark results can't be
 trusted.

 Anyway, I would suggest that you look at optimizing
 this later.

  If it appears null during a sync() call, no harm is done (the
  sync is just ignored - which is fine given that the render queue
  hasn't been created yet, so there's nothing to sync), so this is
  not a problem.
But what does happen if it has already been created, but the thread
calling sync() just does not see the updated "theInstance" value?
Could there be any problem when sync()-calls are left out?

 If the thread calling sync() sees theInstance as null, this means
 that it could not have anything to sync. If there's no queue,
 it could not have put anything into that queue prior to
 calling sync(). The sync() can be safely ignored.

 Thanks,
   Dmitri

Reply via email to