Re: JEP 132: More-prompt finalization

Peter Levart Sun, 31 May 2015 12:33:26 -0700

Hi,

Thanks for views and opinions. I'll try to confront them in-line...


On 05/29/2015 04:18 AM, David Holmes wrote:

Hi Peter,
I guess I'm very concerned about the premise that finalization shouldscale to millions of objects and be performed highly concurrently. Tome that's sending the wrong message about finalization. It also isn'tthe most effective use of cpu resources - most people would want to douseful work on most cpu's most of the time.
Cheers,
David


@David

Ok, fair enough. It shouldn't be necessary to scale finalization tomillions of objects and be performed concurrently. Normal programs don'tneed this. But there is a diagnostic command being developed at thismoment that displays the finalization queue. The utility of suchcommand, as I understand, is precisely to display when the finalizationthread can not cope and Finalizer(s) accumulate. So there must be thatsome hypothetical programs (ab)use finalization or are buggy (deadlock)so that the queue builds up. To diagnose this, a diagnostic command ishelpful. To fix it, one has to fix the code. But what if the problem isnot that much about the allocation/death rate of finalizable instancesthen it is about the heavy code of finalize() methods of thoseinstances. I agree that such programs have a smell and should berewritten to not use finalization but other means of cleanup such asmultiple threads removing WeakReferences from the queue for example orsomething completely different and not based on Reference(s). Butwouldn't it be nice if one could simply set a system property for themax. number of threads processing Finalizer(s)?

I have prepared an improved variant of the prototype that employs asingle ReferenceHandler thread and adds a ForkJoinPool that by defaulthas a single worker thread which replaces the single finalizationthread. So by default, no more threads are used than currently. If onewants (s)he can increase the concurrency of finalization with a systemproperty.

I have also improved the benchmarks that now focus on CPU overhead whenprocessing references at more typical rates, rather than maximumthroughput. They show that all changes taken together practically halfthe CPU time overhead of the finalization processing. So freed CPU timecan be used for more useful work. I have also benchmarked the typicalasynchronous WeakReference processing scenario where one thread removesenqueued WeakReferences from the queue. Results show about 25% decreaseof CPU time overhead.

Why does the prototype reduce more overhead for finalization thanWeakReference processing? The main improvement in the change is the useof multiple doubly-linked lists for registration of Finalizer(s) and theuse of lock-less algorithm for the lists. The WeakReference processingbenchmark also uses such lists internally to handleregistration/deregistration of WeakReferences so that the impact of thispart is minimal and the difference of processing overheads betweenoriginal and changed JDK code more obvious. (De)registration ofFinalizer(s) OTOH is part of JDK infrastructure, so the improvement toregistration list(s) also shows in the results. The results ofWeakReferece processing benchmark also indicate that reverting to theuse of a single finalization thread that just removes Finalizer(s) fromthe ReferenceQueue could lower the overhead even a bit further, but thenit would not be possible to leverage FJ pool to simply configure theparallelism of finalization. If parallel processing of Finalizer(s) isan undesirable feature, I could restore the single finalization threadand the CPU overhead of finalization would be reduced to about 40% ofcurrent overhead with just the changes to data structures.


So, for the curious, here's the improved prototype:

http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/

And here are the improved benchmarks (with some results inline):

http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/

The benchmark results in the ThroughputBench.java show the output of thetest(s) when run with the Linux "time" command which shows the elapsedreal time and the consumed user and system CPU times. I think this isrelevant for measuring CPU overhead.

So my question is: Is it or is it not desirable to have a configurablemeans to parallelize the finalization processing? The reduction of CPUoverhead of infrastructure code should always be desirable, right?


On 05/29/2015 05:57 AM, Kirk Pepperdine wrote:

Hi Peter,

It is a very interesting proposal but to further David’s comments, the 
life-cycle costs of reference objects is horrendous of which the actual process 
of finalizing an object is only a fraction of that total cost. Unfortunately 
your micro-benchmark only focuses on one aspect of that cost. In other words, 
it isn’t very representative of a real concern. In the real world the finalizer 
*must compete with mutator threads and since F-J is an “all threads on deck” 
implementation, it doesn’t play well with others. It creates a “tragedy of the 
commons”. That is situations where everyone behaves rationally with a common 
resource but to the detriment of the whole group”. In short, parallelizing 
(F-Jing) *everything* in an application is simply not a good idea. We do not 
live in an infinite compute environment which means to have to consider the 
impact of our actions to the entire group.


@Kirk

I changed the prototype to only use a single FJ thread by default(configurable with a system property). Lowering the CPU overhead offinalizer processing for 50% is also an improvement. I'm still keepingfinalization FJ-pool for now because it is more scaleable and has lessoverhead than a solution with multiple threads removing references fromthe same ReferenceQueue. This happens when the FJ-pool is configuredwith > 1 parallelism or when user code calls Runtime.runFinalization()that translates to ForkJoinPool.awaitQuiescence() which lends thecalling thread to help the poll execute the tasks.

This was one of the points of my recent article in Java Magazine which I wrote 
to try to counter some of the rhetoric I was hearing in conference about the 
universal benefits of being able easily parallelize streams in Java 8. Yes, I 
agree it’s a great feature but it must be used with discretion. Case in point. 
After I finished writing the article, I started running into a couple of early 
adopters that had swallowed the parallel message whole indiscriminately 
parallelizing all of their streams. As you can imagine, they were quite 
surprised by the results and quickly worked to de-parallelize *all* of the 
streams in the application.

To add some ability to parallelize the handling of reference objects seems like a 
good idea if you are collecting large numbers of reference objects (>10,000 per 
GC cycle). However if you are collecting large numbers of reference objects you’re 
most likely doing something else wrong. IME, finalization is extremely useful but 
really only for a limited number of use cases and none of them (to date) have 
resulted in the app burning through 1000s of final objects / sec.

It would be interesting to know why why you picked on this particular issue.

Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle someof it's goals. I think I at least showed that the VM part of referencehandling is mostly not the performance problem (if there is a problem atall), but the Java side could be modernized a bit.

Kind regards,
Kirk


On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote:

For what it's worth, I fully agree with David and Kirk around finalization not 
necessarily needing this treatment.

However, I was hoping this would have the effect of improving (non-finalizable) 
reference handling. We've seen serious issues in WeakReference handling and 
have had to write some twisted code to deal with this.


@Moh

Can you elaborate some more on what twists were necessary or whatproblems you had?

So I guess the question I have to Kirk and David is: do you feel a GC load of 10K 
WeakReferences per cycle is also "doing something else wrong"?

If there is an elegant way to achieve your goal without usingWeakReferences then it might be better to not use them. But it is alsotrue that WeakReferences frequently lend an elegant way to solve aproblem. The same goes with finalization which is sometimes even moreelegant.

Sorry if this is going off-topic.


You're spot on topic and thanks for your comment.

Thanks
Moh



Regards, Peter

Re: JEP 132: More-prompt finalization

Reply via email to