[ 
https://issues.apache.org/jira/browse/GROOVY-10964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702100#comment-17702100
 ] 

Ingo Wilms commented on GROOVY-10964:
-------------------------------------

[~paulk] thank you so much! It's really incredible how fast things are put in 
motion here. That motivated me to look into the implementation of TreeSet and I 
believe that is where the real culprit lies. While TreeSet overrides 
{{{}remove(){}}}, it inherits {{removeAll()}} from AbstractSet. If I am not 
mistaken this is the code:
{code:groovy}
public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    boolean modified = false;
    Iterator i;
    Object e;
    if (this.size() > c.size()) {
        for(i = c.iterator(); i.hasNext(); modified |= this.remove(e)) {
            e = i.next();
        }
    } else {
        i = this.iterator();        while(i.hasNext()) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }    return modified;
}{code}
While I understand it might be clever to iterate through less objects, the use 
of {{contains()}} in the {{{}else{}}}-block leads to two problems:
 # It is inefficient if {{this}} is a TreeSet and {{c}} is an ArrayList, 
because {{contains()}} is cheap for the tree, but not for the list.
 # Since {{contains()}} has a different meaning for TreeSet with custom 
comparator than {{for}} ArrayList, it can lead to different results.

I guess this is one of the cases why the 
[doc|https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/TreeSet.html]
 for TreeSet recommends the {{comparator}} to be consistent with 
{{{}equals(){}}}, but I was'nt aware that this also applies for the elements 
you want to remove. To be fair, this behavoir for {{removeAll}} is well 
[documented|https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection)],
 but I read that part for the first time.

Here is an example for different running times
{code:groovy}
int n = 100_000

// This is fast, because tree.size() > list.size() and tree.remove() is fast
List list1 = (1..<n).collect { it }
TreeSet tree1 = (1..n).collect {it}
tree1.removeAll(list1)
println("fast version finished")

// This is slow, because tree.size() <= list.size() and list.contains() is slow
List list2 = (1..n).collect { it }
TreeSet tree2 = (1..n).collect {it}
tree2.removeAll(list2)
println("slow version finished") {code}
And here is an example for different results
{code:groovy}
List a = [2, 3.0, 4, 5, 6 ]
List b1 = [2.0, 2, 3, 7]
List b2 = [2.0, 2, 3, 7, 8]

def tree1 = new TreeSet(new NumberAwareComparator())
tree1.addAll(a)
tree1.removeAll(b1)

def tree2 = new TreeSet(new NumberAwareComparator())
tree2.addAll(a)
tree2.removeAll(b2)

assert (tree1 != tree2)
assert tree1.toSet() == [4, 5, 6] as Set
assert tree2.toSet() == [3.0, 4, 5, 6] as Set {code}
I guess I will use {{removeAll()}} with more caution in the future.

> List.minus() slow for Numbers
> -----------------------------
>
>                 Key: GROOVY-10964
>                 URL: https://issues.apache.org/jira/browse/GROOVY-10964
>             Project: Groovy
>          Issue Type: Question
>          Components: groovy-jdk
>    Affects Versions: 2.4.0, 3.0.9, 4.0.9
>            Reporter: Ingo Wilms
>            Priority: Minor
>              Labels: Collections, Groovy, List, Number
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In List.minus() is a n*LOG\(n) version for comparable objects. Only for 
> numbers, there is a dedicated slower n^2*LOG\(n) version. Is there a reason 
> for this? It exists since 2.4.0 and hasn't changed much since then. Here is 
> part of the code from version 4.0.9:
>  
> {code:java}
> // if (nlgnSort && (head instanceof Comparable)) {
>     //n*LOG(n) version
>     Set<T> answer;
>     if (head instanceof Number) {
>         answer = new TreeSet<>(comparator);
>         answer.addAll(self1);
>         for (T t : self1) {
>             if (t instanceof Number) {
>                 for (Object t2 : removeMe1) {
>                     if (t2 instanceof Number) {
>                         if (comparator.compare(t, (T) t2) == 0)
>                             answer.remove(t);
>                     }
>                 }
>             } else {
>                 if (removeMe1.contains(t))
>                     answer.remove(t);
>             }
>         }
>     } else {
>         answer = new TreeSet<>(comparator);
>         answer.addAll(self1);
>         answer.removeAll(removeMe1);
>     }
>     for (T o : self1) {
>         if (answer.contains(o))
>             ansCollection.add(o);
>     }
> } else {
>     //n*n version {code}
> I fail to see why the whole extra block for numbers beginning with
> {code:java}
> if (head instanceof Number) { {code}
> is necessary.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to