On 5/16/19 11:48 PM, Peter Levart wrote:
As Stuart says, optimizations mustn't change semantics. So is there a possibly narrower optimization, that doesn't change the semantics?

Here's a sketch of one:
- create a marker interface (could be JDK-internal) and mark all conforming Set implementations with it - use AbstractSet.removeAll optimization only when both collections implement the marker interface

Hi Peter,

It's certainly reasonable for there to be some concern over the potential performance loss. I don't particularly like the fact that even as we're improving the semantics, that some programs will suffer a slowdown, even those that aren't actually affected by the semantic issues.

However, I'm not convinced that we really understand the performance issue. When this was brought up in January, [1] the claim was that this wasted thousands of CPU hours. I'm skeptical. To back this up, in a followup message [2] the original poster linked to a Stack Overflow question [3] that turns out to have been derived from an old article by Jon Skeet [4]. To be sure, this is an interesting technical discussion, but it doesn't count as evidence that this is an actual performance problem.

If you look at the example closely, it turns out that it calls removeAll() and passes an argument a List with 300,000 elements. If you're doing operations on large lists, maybe you shouldn't be surprised to see a slowdown. Maybe instead you should have been surprised (in the iteration-switched case) that it ran reasonably fast at all.

Nonetheless it remains a possibility that some code that worked properly and performed reasonably well will suffer a slowdown. One way -- and I stress that this is only one way -- that performance might be improved is to switch the collection upon which iteration is done. The linked bug reports have a bunch of suggestions for doing this, including checking to see whether the argument is a List. To these I'd add the stipulation that, the iteration can be switched if it can be determined that it is *safe* to do so; and the safety depends on whether the two collections use compatible membership semantics.

Collections that use equals() are compatible with each other; keysets of IdentityHashMap are compatible with each other; SortedSets that use identical comparison methods are compatible with each other. But note, SortedSets that use a comparisons methods that are consistent with equals are also compatible with collections that use equals(). And it seems likely to be undecidable whether two different (!=) comparison methods are in fact compatible.

In any case it would interesting to try to chew on this notion of compatible membership semantics for a while. It might be difficult to come up with a complete and correct test. It's probably easier to come up with an approximation, like ConcurrentSkipListMap's equals() method. [5] I'm not sure such a discussion would be fruitful, though, until we're sure we understand how much of a performance problem this actually is.

s'marks




[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html

[2] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html

[3] https://stackoverflow.com/questions/28671903/the-hashsett-removeall-method-is-surprisingly-slow

[4] https://codeblog.jonskeet.uk/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza/

[5] http://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ConcurrentSkipListMap.java#l1706

Reply via email to