Hello Chris, Hello Dmitri, 1.) Thanks for mentioning J2DBench, I'll have a look at it.
2.) > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6317330 Thanks for mentioning it, I already had a look at it. 3.) http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253009 Mentions a deadlock problem that can occur with a seperate lock for the RenderQueue. For my X11 pipeline it would be enough to ensure only one thread does access xlib, it does not have to be always the queue-flush thread. So if I would allow sync()/flushNow() on any thread, the problem would not exist, right? 4.) > If the thread calling sync() sees theInstance as null, this means > that it could not have anything to sync As far as I understand the JMM, it could be that thread1 already called getInstance() (which creates and sets theInstance()), but thread2 calls sync() - but sees null. Don't know wether a lost sync() could be a problem at all. 5.) > Anyway, I would suggest that you look at optimizing > this later Yes, that would be probably the best. I was just a bit worried which design I should choose. The JNI-overhead itself (35 cycles, Core2Duo) is so small, that I am not sure wether the whole Buffered Rendering is a win at all. I benchmarked the switch-statement which is used to decode the command-stream and on my Core2Duo. Only calling the switch in a loop already takes 20 cycles (which is quite reasonable keeping in mind the generated table-jump puzzles the pipeline). Add the overhead of stream-encoding, inter-thread communication and I guess it's also somewhere between 30-50 cycles per j2d-primitive. However if I could remove most of the locking, which at least on my machine seems to add a lot of overhead, this would justify the additional code. With thread-private buffers, and all threads allowed to flush the queue themself instead of relying on the queue-flush-thread to do it, it should be possible. Sorry for the traffic and thanks for your patience, lg Clemens
