So, it looks like bug 4266874 pointed out the same performance issue I did, and that its fix did break things as described in bug 6394757 (Why wasn't the AbstractSet implementation removed? Not to break performances?).
A work-around for the quadratic behavior is proposed in discussion about bug 5028425 : "foo.removeAll(new HashSet(baz));" The fix I proposed makes use of the faulty algorithm systematically if "c" is not a Set, whereas as it is now it only occurs if "size() > c.size()". Anyway, in case anyone would like to override AbstractSet.removeAll(Collection) with it, here is a n'th version of this removeAll operation, which uses possibly O(n) "isEmpty()" in less cases (not in the loop, and allowing "good garbage" iterators instead), and only checks sizes if "c" is a Set: public boolean removeAll(Collection<?> c) { boolean modified = false; if (c instanceof Set<?>) { // Not using "c.isEmpty()", which might be in O(n): // if "c" is empty, "c.size()" should be fast even if also in O(n), // and if "c" is not empty, we need to compute it anyway. final int cSize = c.size(); if (cSize == 0) { return false; } final int thisSize = size(); if (thisSize < cSize) { if (thisSize == 0) { return false; } for (Iterator<?> i = iterator(); i.hasNext(); ) { if (c.contains(i.next())) { i.remove(); modified = true; } } return modified; } // Will iterate on "c". } else { if (isEmpty()) { return false; } } // Sometimes bad (bug 6394757) but often faster. for (Iterator<?> i = c.iterator(); i.hasNext(); ) { modified |= remove(i.next()); } return modified; } Jeff ________________________________ De : Jason Mehrens <jason_mehr...@hotmail.com> À : mike.dui...@oracle.com; jeffh...@rocketmail.com Cc : core-libs-dev@openjdk.java.net Envoyé le : Jeu 14 juillet 2011, 0h 19min 04s Objet : RE: AbstractCollection.removeAll(Collection) and AbstractSet.removeAll(Collection) Mike, The history is in the evaluation of http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6394757 I don't think that even adding a empty check can be considered an optimization when dealing with two abstract things. The iterator creation here is 'good garbage' and worst case AbstractCollection.isEmpty could be O(N). JDKs 5 and 6 shipped with two collections (CHM views) that had O(N) isEmpty methods. I think the following evaluations really cover the issues regarding this: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6519662 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5028425 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6633605 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6250140 Jason > Subject: Re: AbstractCollection.removeAll(Collection) and >AbstractSet.removeAll(Collection) > From: mike.dui...@oracle.com > Date: Wed, 13 Jul 2011 13:34:43 -0700 > To: jeffh...@rocketmail.com > CC: core-libs-dev@openjdk.java.net > > It looks to me that AbstractCollection could benefit from a c.isEmpty() check >but that's the only optimization I currently see. It's necessary to iterate >the >target collection because values can be repeated for some collection types. > > I agree that AbstractSet.removeAll appears to make a poor judgement that the >cost of this.contains() and c.contains() are similar. I'd like to know if >there >are common usage patterns where iteration over c isn't the best choice. >(adding >isEmpty() checks for both this and c of course). > > Josh? Martin? Doug? Any history why AbstractSet has this optimization? > > Mike > > On Jul 13 2011, at 13:02 , Jeff Hain wrote: > > > > >> It doesn't work because the complexity of size() can be O(n). > >> In my opinion, only the checks for emptyness are Ok. > > > > The calls to "size()" are already there in current implementation, > > and I think with the idea that in most cases they are O(1). > > If they are O(n), it's of course pointless to use them, since it > > makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size() > > and n2 = c.size()), while simply iterating on "c" without checking > > sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)). > > But, then, the whole treatment remains linear at worse, and if, > > as it is often the case, they are O(1), it allows to make it > > O(min(n2,n1)) if "c" is also a set. > > Considering sizes therefore allows to have O(max(n1,n2)) instead > > of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair > > amount of cases, which can be considered a win. Also, if size()'s > > are O(n), you are most likely doing concurrency, so you are smart, > > know what you do, and can make your own removeAll ;) > > What seems to have been overlooked in current implementation, > > is that this optimization, that works for sets, makes things actually > > worse if "c" is a list, in which case it's best to always iterate on > > "c", iterating on "this" resulting in a quadratic behavior. > > > > To make things simple, and have a treatment always in O(n2), > > we could just stick to iterating on "c" without checking sizes: > > > > public boolean AbstractSet.removeAll(Collection<?> c) { > > if (isEmpty() || c.isEmpty()) { > > return false; > > } > > boolean modified = false; > > for (Iterator<?> i = c.iterator(); i.hasNext(); ) { > > if (remove(i.next())) { > > modified = true; > > if (isEmpty()) { > > break; > > } > > } > > } > > return modified; > > } > > > > (*) We could refine considering the case of tree sets. > > > > > > > >> The overriden equals method should be no problem > >> if the Implementation is an set. So it should be > >> possible to exit in this case in the AbstractSet > >> Implementation, isn't it? > > > > For AbstractCollection.removeAll(Collection), > > that wouldn't work, since each instance to remove > > in "c" can have multiple occurrences in "this". > > > > For AbstractSet.removeAll(Collection), > > that wouldn't work either, because if "c" is > > based on "equals", and "c" is an identity set > > (based on "=="), multiple elements of "this" > > could be equal to a same element of "c", > > and removing as many as "c" contains doesn't > > mean you can't find more elements in "this" > > that would be equal to elements of "c". > > > > A best effort implementation, that should > > work if collections are not sets of different > > kinds, would look like this: > > > > public boolean AbstractSet.removeAll(Collection<?> c) { > > if (isEmpty() || c.isEmpty()) { > > return false; > > } > > boolean modified = false; > > final int n1 = size(); > > final int n2 = c.size(); > > final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2; > > if (n2 <= n1 * (long)cContainsAssumedComplexity) { > > for (Iterator<?> i = c.iterator(); i.hasNext(); ) { > > if (remove(i.next())) { > > modified = true; > > if (isEmpty()) { > > break; > > } > > } > > } > > } else { > > // Here, we know "c" is a Set > > // (else, we tested "n2 <= n1 * (long)n2", > > // which is always true for n1 > 0 and n2 > 0). > > if (n2 != Integer.MAX_VALUE) { > > // Will stop iterating once > > // we removed from "this" as > > // many elements as "c" contains. > > // XXX Does not work if "this" is an identity set, > > // and "c" based on "equals". > > int toRemove = n2; > > for (Iterator<?> i = iterator(); i.hasNext(); ) { > > if (c.contains(i.next())) { > > i.remove(); > > modified = true; > > if (--toRemove == 0) { > > break; > > } > > } > > } > > } else { > > // "c" might contain more than Integer.MAX_VALUE > > // elements, so we can't break early. > > for (Iterator<?> i = iterator(); i.hasNext(); ) { > > if (c.contains(i.next())) { > > i.remove(); > > modified = true; > > } > > } > > } > > } > > return modified; > > } > > > > > > > > Jeff > > > > > > > > ________________________________ > > De : Rémi Forax <fo...@univ-mlv.fr> > > À : core-libs-dev@openjdk.java.net > > Envoyé le : Mer 13 juillet 2011, 18h 27min 48s > > Objet : Re: AbstractCollection.removeAll(Collection) and > > AbstractSet.removeAll(Collection) > > > > It doesn't work because the complexity of size() can be O(n). > > In my opinion, only the checks for emptyness are Ok. > > > > Rémi > > > > > > On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote: > >> Jeff Hain wrote: > >>> Hello. > >>> I have some remarks about the methods named in the subject (JDK 6u26, JDK > >>> 7b147). > >>> 1) AbstractCollection.removeAll(Collection) > >>> This method could return immediately if the specified collection is > >>> empty, not to iterate uselessly over all "this", or if "this" is empty, > >>> not > > > to > >>> create an iterator for nothing. > >>> I also thought about returning as soon as as many elements as the > >>> specified collection contains have been removed, but it would not behave > >>> properly with collections containing more than Integer.MAX_VALUE elements, > >>> or with some overriden equals methods etc. > >> The overriden equals method should be no problem if the Implementation is > >> an > > >> set. So it should be possible to exit in this case in the AbstractSet > >> Implementation, isn't it? > >> > >>> 2) AbstractSet.removeAll(Collection) > >>> Let n1 be the size of "this", and n2 the size of the specified > > collection. > >>> This method is in O(n1*n2) if the specified collection has a > >>> contains(Object) > >>> method in O(n2) (as an ArrayList does), and n1<= n2. > >>> I think a better implementation could be done by taking care of the > >>> "setness" of the specified collection, and taking care of collections sizes > >>> differently. > >>> Also: > >>> - when iterating on the specified collection, the method could return as > >>> soon as "this" is empty, not to continue useless iterations. > >>> - when iterating on "this", the method could return if the specified > >>> collection is empty, not to iterate over all "this" uselessly. > >>> > >>> # Current implementation: > >>> public boolean removeAll(Collection<?> c) { > >>> boolean modified = false; > >>> if (size()> c.size()) { > >>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) > >>> modified |= remove(i.next()); > >>> } else { > >>> for (Iterator<?> i = iterator(); i.hasNext(); ) { > >>> if (c.contains(i.next())) { > >>> i.remove(); > >>> modified = true; > >>> } > >>> } > >>> } > >>> return modified; > >>> } > >>> > >>> # Alternative implementation (not tested):public boolean > >>> removeAll(Collection<?> c) { > >>> // Quick check on emptinesses, to make later treatments simpler. > >>> if (isEmpty() || c.isEmpty()) { > >>> return false; > >>> } > >>> boolean modified = false; > >>> final int n1 = size(); > >>> final int n2 = c.size(); > >>> // If the "c" is not a set, we are pessimistic about its "contains" > >>> // method > >>> // complexity. > >>> final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2; > >>> // If we iterate over "this", assumed complexity is O(n2). > >>> // If we iterate over "c", assumed complexity is O(n1 * > >>> cContainsAssumedComplexity). > >>> // Cast to long to avoid overflow. > >>> if (n2< n1 * (long)cContainsAssumedComplexity) { > >>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) { > >>> if (remove(i.next())) { > >>> modified = true; > >>> if (isEmpty()) { > >>> break; > >>> } > >>> } > >>> } > >>> } else { > >>> for (Iterator<?> i = iterator(); i.hasNext(); ) { > >>> if (c.contains(i.next())) { > >>> i.remove(); > >>> modified = true; > >>> } > >>> } > >>> } > >>> return modified; > >>> } > >> look good to me. > >> > >>> > >>> > >>> Regards, > >>> > >>> Jeff > >>> >