Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Stuart Marks Mon, 20 May 2019 16:54:39 -0700

On 5/16/19 11:48 PM, Peter Levart wrote:

As Stuart says, optimizations mustn't change semantics. So is there a possiblynarrower optimization, that doesn't change the semantics?
Here's a sketch of one:
- create a marker interface (could be JDK-internal) and mark all conforming Setimplementations with it- use AbstractSet.removeAll optimization only when both collections implementthe marker interface


Hi Peter,

It's certainly reasonable for there to be some concern over the potentialperformance loss. I don't particularly like the fact that even as we'reimproving the semantics, that some programs will suffer a slowdown, even thosethat aren't actually affected by the semantic issues.

However, I'm not convinced that we really understand the performance issue. Whenthis was brought up in January, [1] the claim was that this wasted thousands ofCPU hours. I'm skeptical. To back this up, in a followup message [2] theoriginal poster linked to a Stack Overflow question [3] that turns out to havebeen derived from an old article by Jon Skeet [4]. To be sure, this is aninteresting technical discussion, but it doesn't count as evidence that this isan actual performance problem.

If you look at the example closely, it turns out that it calls removeAll() andpasses an argument a List with 300,000 elements. If you're doing operations onlarge lists, maybe you shouldn't be surprised to see a slowdown. Maybe insteadyou should have been surprised (in the iteration-switched case) that it ranreasonably fast at all.

Nonetheless it remains a possibility that some code that worked properly andperformed reasonably well will suffer a slowdown. One way -- and I stress thatthis is only one way -- that performance might be improved is to switch thecollection upon which iteration is done. The linked bug reports have a bunch ofsuggestions for doing this, including checking to see whether the argument is aList. To these I'd add the stipulation that, the iteration can be switched if itcan be determined that it is *safe* to do so; and the safety depends on whetherthe two collections use compatible membership semantics.

Collections that use equals() are compatible with each other; keysets ofIdentityHashMap are compatible with each other; SortedSets that use identicalcomparison methods are compatible with each other. But note, SortedSets that usea comparisons methods that are consistent with equals are also compatible withcollections that use equals(). And it seems likely to be undecidable whether twodifferent (!=) comparison methods are in fact compatible.

In any case it would interesting to try to chew on this notion of compatiblemembership semantics for a while. It might be difficult to come up with acomplete and correct test. It's probably easier to come up with anapproximation, like ConcurrentSkipListMap's equals() method. [5] I'm not suresuch a discussion would be fruitful, though, until we're sure we understand howmuch of a performance problem this actually is.


s'marks




[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html

[2] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html

[3]https://stackoverflow.com/questions/28671903/the-hashsett-removeall-method-is-surprisingly-slow

[4]https://codeblog.jonskeet.uk/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza/

[5]http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ConcurrentSkipListMap.java#l1706

Re: RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Reply via email to