It looks to me that AbstractCollection could benefit from a c.isEmpty() check 
but that's the only optimization I currently see. It's necessary to iterate the 
target collection because values can be repeated for some collection types.

I agree that AbstractSet.removeAll appears to make a poor judgement that the 
cost of this.contains() and c.contains() are similar. I'd like to know if there 
are common usage patterns where iteration over c isn't the best choice. (adding 
isEmpty() checks for both this and c of course).

Josh? Martin? Doug? Any history why AbstractSet has this optimization?

Mike

On Jul 13 2011, at 13:02 , Jeff Hain wrote:

> 
>> It doesn't work because the complexity of size() can be O(n).
>> In my opinion, only the checks for emptyness are Ok.
> 
>   The calls to "size()" are already there in current implementation,
> and I think with the idea that in most cases they are O(1).
>   If they are O(n), it's of course pointless to use them, since it
> makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size()
> and n2 = c.size()), while simply iterating on "c" without checking
> sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)).
>   But, then, the whole treatment remains linear at worse, and if,
> as it is often the case, they are O(1), it allows to make it
> O(min(n2,n1)) if "c" is also a set.
>   Considering sizes  therefore allows to have O(max(n1,n2))  instead
> of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair
> amount of cases, which can be considered a win. Also, if size()'s
> are O(n), you are most likely doing concurrency, so you are smart,
> know what you do, and can make your own removeAll ;)
>   What seems to have been overlooked in current implementation,
> is that this optimization, that works for sets, makes things actually
> worse if "c" is a list, in which case it's best to always iterate on
> "c", iterating on "this" resulting in a quadratic behavior.
> 
>   To make things simple, and have a treatment always in O(n2),
> we could just stick to iterating on "c" without checking sizes:
> 
> public boolean AbstractSet.removeAll(Collection<?> c) {
>   if (isEmpty() || c.isEmpty()) {
>      return false;
>   }
>   boolean modified = false;
>   for  (Iterator<?> i = c.iterator(); i.hasNext(); ) {
>      if (remove(i.next())) {
>         modified = true;
>         if (isEmpty()) {
>            break;
>         }
>      }
>   }
>   return modified;
> }
> 
> (*) We could refine considering the case of tree sets.
> 
> 
> 
>> The overriden equals method should be no problem
>> if the Implementation is an set. So it should be
>> possible to exit in this case in the AbstractSet
>> Implementation, isn't it?
> 
>   For AbstractCollection.removeAll(Collection),
> that wouldn't work, since each instance to remove
> in "c" can have multiple occurrences in "this".
> 
>   For  AbstractSet.removeAll(Collection),
> that wouldn't work either, because if "c" is
> based on "equals", and "c" is an identity set
> (based on "=="), multiple elements of "this"
> could be equal to a same element of "c",
> and removing as many as "c" contains doesn't
> mean you can't find more elements in "this"
> that would be equal to elements of "c".
> 
>   A best effort implementation, that should
> work if collections are not sets of different
> kinds, would look like this:
> 
> public boolean AbstractSet.removeAll(Collection<?> c) {
>   if (isEmpty() || c.isEmpty()) {
>      return false;
>   }
>   boolean modified = false;
>   final int n1 = size();
>   final int n2 = c.size();
>   final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
>   if (n2 <= n1 *  (long)cContainsAssumedComplexity) {
>      for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
>         if (remove(i.next())) {
>            modified = true;
>            if (isEmpty()) {
>               break;
>            }
>         }
>      }
>   } else {
>      // Here, we know "c" is a Set
>      // (else, we tested "n2 <= n1 * (long)n2",
>      // which is always true for n1 > 0 and n2 > 0).
>      if (n2 != Integer.MAX_VALUE)  {
>         // Will stop iterating once
>         // we removed from "this" as
>         // many elements as "c" contains.
>         // XXX Does not work if "this" is an identity set,
>         // and "c" based on "equals".
>         int toRemove = n2;
>         for (Iterator<?> i = iterator(); i.hasNext(); ) {
>            if (c.contains(i.next())) {
>               i.remove();
>               modified =  true;
>               if (--toRemove == 0) {
>                  break;
>               }
>            }
>         }
>      } else {
>         // "c" might contain more than Integer.MAX_VALUE
>         // elements, so we can't break early.
>         for (Iterator<?> i = iterator(); i.hasNext(); ) {
>            if (c.contains(i.next()))  {
>               i.remove();
>               modified = true;
>            }
>         }
>      }
>   }
>   return modified;
> }
> 
> 
> 
> Jeff
> 
> 
> 
> ________________________________
> De : Rémi Forax <fo...@univ-mlv.fr>
> À : core-libs-dev@openjdk.java.net
> Envoyé le : Mer 13 juillet 2011, 18h 27min 48s
> Objet : Re: AbstractCollection.removeAll(Collection) and 
> AbstractSet.removeAll(Collection)
> 
> It doesn't work because the  complexity of size() can be O(n).
> In my opinion, only the checks for emptyness are Ok.
> 
> Rémi
> 
> 
> On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote:
>> Jeff Hain wrote:
>>> Hello.
>>>  I have some remarks about the methods named in the subject (JDK 6u26, JDK
>>> 7b147).
>>>    1) AbstractCollection.removeAll(Collection)
>>>    This method could return immediately if the specified collection is
>>> empty, not to iterate uselessly over all "this", or if "this" is empty, not 
> to
>>> create an iterator for nothing.
>>>    I also thought about returning as soon as as many elements as the
>>> specified collection contains have been removed, but it would not behave
>>> properly with collections containing more than Integer.MAX_VALUE elements,
>>> or with some overriden equals methods etc.
>> The overriden equals method should be no problem if the Implementation  is 
>> an 
>> set. So it should be possible to exit in this case in the  AbstractSet 
>> Implementation, isn't it?
>> 
>>>    2) AbstractSet.removeAll(Collection)
>>>    Let n1 be the size of "this", and n2 the size of the specified 
> collection.
>>>    This method is in O(n1*n2) if the specified collection has a 
>>> contains(Object)
>>> method in O(n2) (as an ArrayList does), and n1<= n2.
>>>    I think a better implementation could be done by taking care of the
>>> "setness" of the specified collection, and taking care of collections sizes
>>> differently.
>>>    Also:
>>> - when iterating on the specified collection, the method  could return as
>>> soon as "this" is empty, not to continue useless iterations.
>>> - when iterating on "this", the method could return if the specified
>>> collection is empty, not to iterate over all "this" uselessly.
>>> 
>>>  # Current implementation:
>>> public boolean removeAll(Collection<?>  c) {
>>>    boolean modified = false;
>>>    if (size()>  c.size()) {
>>>       for (Iterator<?>  i = c.iterator(); i.hasNext(); )
>>>          modified |= remove(i.next());
>>>    } else {
>>>       for (Iterator<?>  i = iterator(); i.hasNext(); ) {
>>>          if (c.contains(i.next())) {
>>>             i.remove();
>>>              modified = true;
>>>          }
>>>       }
>>>    }
>>>    return modified;
>>> }
>>> 
>>>  # Alternative implementation (not tested):public boolean
>>> removeAll(Collection<?> c) {
>>>    // Quick check on emptinesses, to make later treatments simpler.
>>>    if (isEmpty() || c.isEmpty()) {
>>>       return false;
>>>    }
>>>    boolean modified = false;
>>>    final int n1 = size();
>>>    final int n2 = c.size();
>>>    // If the "c" is not a set, we are pessimistic about its "contains"
>>>   // method
>>>   // complexity.
>>>    final int  cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
>>>    // If we iterate over "this", assumed complexity is O(n2).
>>>    // If we iterate over "c", assumed complexity is O(n1 * 
>>> cContainsAssumedComplexity).
>>>    // Cast to long to avoid overflow.
>>>    if (n2<  n1 * (long)cContainsAssumedComplexity) {
>>>       for (Iterator<?>  i = c.iterator(); i.hasNext(); ) {
>>>          if (remove(i.next())) {
>>>             modified = true;
>>>             if (isEmpty()) {
>>>                break;
>>>             }
>>>          }
>>>        }
>>>    } else {
>>>       for (Iterator<?>  i = iterator(); i.hasNext(); ) {
>>>          if (c.contains(i.next())) {
>>>             i.remove();
>>>             modified = true;
>>>          }
>>>       }
>>>    }
>>>    return modified;
>>> }
>> look good to me.
>> 
>>> 
>>> 
>>> Regards,
>>> 
>>> Jeff
>>> 

Reply via email to