Hi Peter,

Interesting email. I think it is a thoughtful contribution and these are great 
responses to concerns and questions. I hope it receives the due consideration 
it deserves.

Kind regards,
Kirk

On May 31, 2015, at 9:32 PM, Peter Levart <peter.lev...@gmail.com> wrote:

> Hi,
> 
> Thanks for views and opinions. I'll try to confront them in-line...
> 
> On 05/29/2015 04:18 AM, David Holmes wrote:
>> Hi Peter, 
>> 
>> I guess I'm very concerned about the premise that finalization should scale 
>> to millions of objects and be performed highly concurrently. To me that's 
>> sending the wrong message about finalization. It also isn't the most 
>> effective use of cpu resources - most people would want to do useful work on 
>> most cpu's most of the time. 
>> 
>> Cheers, 
>> David 
> 
> @David
> 
> Ok, fair enough. It shouldn't be necessary to scale finalization to millions 
> of objects and be performed concurrently. Normal programs don't need this. 
> But there is a diagnostic command being developed at this moment that 
> displays the finalization queue. The utility of such command, as I 
> understand, is precisely to display when the finalization thread can not cope 
> and Finalizer(s) accumulate. So there must be that some hypothetical programs 
> (ab)use finalization or are buggy (deadlock) so that the queue builds up. To 
> diagnose this, a diagnostic command is helpful. To fix it, one has to fix the 
> code. But what if the problem is not that much about the allocation/death 
> rate of finalizable instances then it is about the heavy code of finalize() 
> methods of those instances. I agree that such programs have a smell and 
> should be rewritten to not use finalization but other means of cleanup such 
> as multiple threads removing WeakReferences from the queue for example or 
> something completely different and not based on Reference(s). But wouldn't it 
> be nice if one could simply set a system property for the max. number of 
> threads processing Finalizer(s)?
> 
> I have prepared an improved variant of the prototype that employs a single 
> ReferenceHandler thread and adds a ForkJoinPool that by default has a single 
> worker thread which replaces the single finalization thread. So by default, 
> no more threads are used than currently. If one wants (s)he can increase the 
> concurrency of finalization with a system property.
> 
> I have also improved the benchmarks that now focus on CPU overhead when 
> processing references at more typical rates, rather than maximum throughput. 
> They show that all changes taken together practically half the CPU time 
> overhead of the finalization processing. So freed CPU time can be used for 
> more useful work. I have also benchmarked the typical asynchronous 
> WeakReference processing scenario where one thread removes enqueued 
> WeakReferences from the queue. Results show about 25% decrease of CPU time 
> overhead.
> 
> Why does the prototype reduce more overhead for finalization than 
> WeakReference processing? The main improvement in the change is the use of 
> multiple doubly-linked lists for registration of Finalizer(s) and the use of 
> lock-less algorithm for the lists. The WeakReference processing benchmark 
> also uses such lists internally to handle registration/deregistration of 
> WeakReferences so that the impact of this part is minimal and the difference 
> of processing overheads between original and changed JDK code more obvious. 
> (De)registration of Finalizer(s) OTOH is part of JDK infrastructure, so the 
> improvement to registration list(s) also shows in the results. The results of 
> WeakReferece processing benchmark also indicate that reverting to the use of 
> a single finalization thread that just removes Finalizer(s) from the 
> ReferenceQueue could lower the overhead even a bit further, but then it would 
> not be possible to leverage FJ pool to simply configure the parallelism of 
> finalization. If parallel processing of Finalizer(s) is an undesirable 
> feature, I could restore the single finalization thread     and the CPU 
> overhead of finalization would be reduced to about 40% of current overhead 
> with just the changes to data structures.
> 
> So, for the curious, here's the improved prototype:
> 
>     
> http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/
> 
> And here are the improved benchmarks (with some results inline):
> 
>     http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/
> 
> 
> The benchmark results in the ThroughputBench.java show the output of the 
> test(s) when run with the Linux "time" command which shows the elapsed real 
> time and the consumed user and system CPU times. I think this is relevant for 
> measuring CPU overhead.
> 
> So my question is: Is it or is it not desirable to have a configurable means 
> to parallelize the finalization processing? The reduction of CPU overhead of 
> infrastructure code should always be desirable, right?
> 
> On 05/29/2015 05:57 AM, Kirk Pepperdine wrote:
>> Hi Peter,
>> 
>> It is a very interesting proposal but to further David’s comments, the 
>> life-cycle costs of reference objects is horrendous of which the actual 
>> process of finalizing an object is only a fraction of that total cost. 
>> Unfortunately your micro-benchmark only focuses on one aspect of that cost. 
>> In other words, it isn’t very representative of a real concern. In the real 
>> world the finalizer *must compete with mutator threads and since F-J is an 
>> “all threads on deck” implementation, it doesn’t play well with others. It 
>> creates a “tragedy of the commons”. That is situations where everyone 
>> behaves rationally with a common resource but to the detriment of the whole 
>> group”. In short, parallelizing (F-Jing) *everything* in an application is 
>> simply not a good idea. We do not live in an infinite compute environment 
>> which means to have to consider the impact of our actions to the entire 
>> group.
> 
> @Kirk
> 
> I changed the prototype to only use a single FJ thread by default 
> (configurable with a system property). Lowering the CPU overhead of finalizer 
> processing for 50% is also an improvement. I'm still keeping finalization 
> FJ-pool for now because it is more scaleable and has less overhead than a 
> solution with multiple threads removing references from the same 
> ReferenceQueue. This happens when the FJ-pool is configured with > 1 
> parallelism or when user code calls Runtime.runFinalization() that translates 
> to ForkJoinPool.awaitQuiescence() which lends the calling thread to help the 
> poll execute the tasks.
> 
>> This was one of the points of my recent article in Java Magazine which I 
>> wrote to try to counter some of the rhetoric I was hearing in conference 
>> about the universal benefits of being able easily parallelize streams in 
>> Java 8. Yes, I agree it’s a great feature but it must be used with 
>> discretion. Case in point. After I finished writing the article, I started 
>> running into a couple of early adopters that had swallowed the parallel 
>> message whole indiscriminately parallelizing all of their streams. As you 
>> can imagine, they were quite surprised by the results and quickly worked to 
>> de-parallelize *all* of the streams in the application.
>> 
>> To add some ability to parallelize the handling of reference objects seems 
>> like a good idea if you are collecting large numbers of reference objects 
>> (>10,000 per GC cycle). However if you are collecting large numbers of 
>> reference objects you’re most likely doing something else wrong. IME, 
>> finalization is extremely useful but really only for a limited number of use 
>> cases and none of them (to date) have resulted in the app burning through 
>> 1000s of final objects / sec.
>> 
>> It would be interesting to know why why you picked on this particular issue.
> 
> Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle some of 
> it's goals. I think I at least showed that the VM part of reference handling 
> is mostly not the performance problem (if there is a problem at all), but the 
> Java side could be modernized a bit.
> 
>> Kind regards,
>> Kirk
> 
> On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote:
>> For what it's worth, I fully agree with David and Kirk around finalization 
>> not necessarily needing this treatment.
>> 
>> However, I was hoping this would have the effect of improving 
>> (non-finalizable) reference handling. We've seen serious issues in 
>> WeakReference handling and have had to write some twisted code to deal with 
>> this.
> 
> @Moh
> 
> Can you elaborate some more on what twists were necessary or what problems 
> you had?
> 
>> So I guess the question I have to Kirk and David is: do you feel a GC load 
>> of 10K WeakReferences per cycle is also "doing something else wrong"?
> 
> If there is an elegant way to achieve your goal without using WeakReferences 
> then it might be better to not use them. But it is also true that 
> WeakReferences frequently lend an elegant way to solve a problem. The same 
> goes with finalization which is sometimes even more elegant.
>  
>> Sorry if this is going off-topic.
> 
> You're spot on topic and thanks for your comment.
> 
>> Thanks
>> Moh
>> 
>> 
> 
> 
> Regards, Peter
> 

Reply via email to