RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Stuart Marks Fri, 01 May 2020 13:06:22 -0700

Hi all,

I'm taking another swing at this one. I've taken a minimal approach compared tomy previous proposals.

Briefly, AbstractSet.removeAll switches from iterating this collection andcalling contains() on its argument, to iterating the argument and callingthis.contains(), if the argument collection is smaller than this collection.This attempt at optimization is incorrect, because this collection and theargument collection might have different semantics for contains().

There is a confounding performance problem, which is that if the argument is aList, contains() is generally linear, which can result in pathologically slow(O(m*n)) performance. Because of the iteration-switching above, this performanceproblem can appear and disappear based on the relative sizes, which is startling.

The fix for the semantic problem is simply to remove the attempted optimizationfrom AbstractSet. This comports with the specification of Collection.removeAlland Set.removeAll, which imply that contains() is called on the argumentcollection. This allows sets to inherit the implementation inAbstractCollection.removeAll. In addition, this ensures that removeAll is nowalways the complement of retainAll (as it should be), which it sometimes was notwhen the optimization was in place.

IdentityHashMap's keySet and entrySet views were broken by this optimization, sothey had to override removeAll with copies of the implementation fromAbstractCollection. Since they can now inherit from AbstractCollection, theseoverrides have been removed.

This leaves a potential performance problem with removeAll when the argument isa List. To mitigate this, HashSet.removeAll switches to iterating the argumentif the argument is a List. This is safe, as both HashSet and List use the samesemantics for contains() (at least, as far as I know).

I've opted not to pursue size-based optimizations at this time, since theyprovide only incremental benefit, whereas the pathological performance problemmentioned above is the primary issue. Also, it's actually quite difficult todetermine when it's safe to switch iteration.

Finally, I've added performance notes to the specifications of containsAll(),removeAll(), and retainAll(), warning about potential performance problems thatcan occur with repeated calls to contains() made by these methods.


Bug:

    https://bugs.openjdk.java.net/browse/JDK-6394757

Webrev:

    http://cr.openjdk.java.net/~smarks/reviews/6394757/webrev.2/

Previous discussions:

    http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html

    
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/058140.html

    
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-February/058378.html

    http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060007.html

    http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-May/060147.html

Thanks,

s'marks

RFR [15] 6394757: rev2: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Reply via email to