It looks to me that AbstractCollection could benefit from a c.isEmpty() check but that's the only optimization I currently see. It's necessary to iterate the target collection because values can be repeated for some collection types.
I agree that AbstractSet.removeAll appears to make a poor judgement that the cost of this.contains() and c.contains() are similar. I'd like to know if there are common usage patterns where iteration over c isn't the best choice. (adding isEmpty() checks for both this and c of course). Josh? Martin? Doug? Any history why AbstractSet has this optimization? Mike On Jul 13 2011, at 13:02 , Jeff Hain wrote: > >> It doesn't work because the complexity of size() can be O(n). >> In my opinion, only the checks for emptyness are Ok. > > The calls to "size()" are already there in current implementation, > and I think with the idea that in most cases they are O(1). > If they are O(n), it's of course pointless to use them, since it > makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size() > and n2 = c.size()), while simply iterating on "c" without checking > sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)). > But, then, the whole treatment remains linear at worse, and if, > as it is often the case, they are O(1), it allows to make it > O(min(n2,n1)) if "c" is also a set. > Considering sizes therefore allows to have O(max(n1,n2)) instead > of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair > amount of cases, which can be considered a win. Also, if size()'s > are O(n), you are most likely doing concurrency, so you are smart, > know what you do, and can make your own removeAll ;) > What seems to have been overlooked in current implementation, > is that this optimization, that works for sets, makes things actually > worse if "c" is a list, in which case it's best to always iterate on > "c", iterating on "this" resulting in a quadratic behavior. > > To make things simple, and have a treatment always in O(n2), > we could just stick to iterating on "c" without checking sizes: > > public boolean AbstractSet.removeAll(Collection<?> c) { > if (isEmpty() || c.isEmpty()) { > return false; > } > boolean modified = false; > for (Iterator<?> i = c.iterator(); i.hasNext(); ) { > if (remove(i.next())) { > modified = true; > if (isEmpty()) { > break; > } > } > } > return modified; > } > > (*) We could refine considering the case of tree sets. > > > >> The overriden equals method should be no problem >> if the Implementation is an set. So it should be >> possible to exit in this case in the AbstractSet >> Implementation, isn't it? > > For AbstractCollection.removeAll(Collection), > that wouldn't work, since each instance to remove > in "c" can have multiple occurrences in "this". > > For AbstractSet.removeAll(Collection), > that wouldn't work either, because if "c" is > based on "equals", and "c" is an identity set > (based on "=="), multiple elements of "this" > could be equal to a same element of "c", > and removing as many as "c" contains doesn't > mean you can't find more elements in "this" > that would be equal to elements of "c". > > A best effort implementation, that should > work if collections are not sets of different > kinds, would look like this: > > public boolean AbstractSet.removeAll(Collection<?> c) { > if (isEmpty() || c.isEmpty()) { > return false; > } > boolean modified = false; > final int n1 = size(); > final int n2 = c.size(); > final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2; > if (n2 <= n1 * (long)cContainsAssumedComplexity) { > for (Iterator<?> i = c.iterator(); i.hasNext(); ) { > if (remove(i.next())) { > modified = true; > if (isEmpty()) { > break; > } > } > } > } else { > // Here, we know "c" is a Set > // (else, we tested "n2 <= n1 * (long)n2", > // which is always true for n1 > 0 and n2 > 0). > if (n2 != Integer.MAX_VALUE) { > // Will stop iterating once > // we removed from "this" as > // many elements as "c" contains. > // XXX Does not work if "this" is an identity set, > // and "c" based on "equals". > int toRemove = n2; > for (Iterator<?> i = iterator(); i.hasNext(); ) { > if (c.contains(i.next())) { > i.remove(); > modified = true; > if (--toRemove == 0) { > break; > } > } > } > } else { > // "c" might contain more than Integer.MAX_VALUE > // elements, so we can't break early. > for (Iterator<?> i = iterator(); i.hasNext(); ) { > if (c.contains(i.next())) { > i.remove(); > modified = true; > } > } > } > } > return modified; > } > > > > Jeff > > > > ________________________________ > De : Rémi Forax <fo...@univ-mlv.fr> > À : core-libs-dev@openjdk.java.net > Envoyé le : Mer 13 juillet 2011, 18h 27min 48s > Objet : Re: AbstractCollection.removeAll(Collection) and > AbstractSet.removeAll(Collection) > > It doesn't work because the complexity of size() can be O(n). > In my opinion, only the checks for emptyness are Ok. > > Rémi > > > On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote: >> Jeff Hain wrote: >>> Hello. >>> I have some remarks about the methods named in the subject (JDK 6u26, JDK >>> 7b147). >>> 1) AbstractCollection.removeAll(Collection) >>> This method could return immediately if the specified collection is >>> empty, not to iterate uselessly over all "this", or if "this" is empty, not > to >>> create an iterator for nothing. >>> I also thought about returning as soon as as many elements as the >>> specified collection contains have been removed, but it would not behave >>> properly with collections containing more than Integer.MAX_VALUE elements, >>> or with some overriden equals methods etc. >> The overriden equals method should be no problem if the Implementation is >> an >> set. So it should be possible to exit in this case in the AbstractSet >> Implementation, isn't it? >> >>> 2) AbstractSet.removeAll(Collection) >>> Let n1 be the size of "this", and n2 the size of the specified > collection. >>> This method is in O(n1*n2) if the specified collection has a >>> contains(Object) >>> method in O(n2) (as an ArrayList does), and n1<= n2. >>> I think a better implementation could be done by taking care of the >>> "setness" of the specified collection, and taking care of collections sizes >>> differently. >>> Also: >>> - when iterating on the specified collection, the method could return as >>> soon as "this" is empty, not to continue useless iterations. >>> - when iterating on "this", the method could return if the specified >>> collection is empty, not to iterate over all "this" uselessly. >>> >>> # Current implementation: >>> public boolean removeAll(Collection<?> c) { >>> boolean modified = false; >>> if (size()> c.size()) { >>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) >>> modified |= remove(i.next()); >>> } else { >>> for (Iterator<?> i = iterator(); i.hasNext(); ) { >>> if (c.contains(i.next())) { >>> i.remove(); >>> modified = true; >>> } >>> } >>> } >>> return modified; >>> } >>> >>> # Alternative implementation (not tested):public boolean >>> removeAll(Collection<?> c) { >>> // Quick check on emptinesses, to make later treatments simpler. >>> if (isEmpty() || c.isEmpty()) { >>> return false; >>> } >>> boolean modified = false; >>> final int n1 = size(); >>> final int n2 = c.size(); >>> // If the "c" is not a set, we are pessimistic about its "contains" >>> // method >>> // complexity. >>> final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2; >>> // If we iterate over "this", assumed complexity is O(n2). >>> // If we iterate over "c", assumed complexity is O(n1 * >>> cContainsAssumedComplexity). >>> // Cast to long to avoid overflow. >>> if (n2< n1 * (long)cContainsAssumedComplexity) { >>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) { >>> if (remove(i.next())) { >>> modified = true; >>> if (isEmpty()) { >>> break; >>> } >>> } >>> } >>> } else { >>> for (Iterator<?> i = iterator(); i.hasNext(); ) { >>> if (c.contains(i.next())) { >>> i.remove(); >>> modified = true; >>> } >>> } >>> } >>> return modified; >>> } >> look good to me. >> >>> >>> >>> Regards, >>> >>> Jeff >>>