[ https://issues.apache.org/jira/browse/GROOVY-10964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702100#comment-17702100 ]
Ingo Wilms commented on GROOVY-10964: ------------------------------------- [~paulk] thank you so much! It's really incredible how fast things are put in motion here. That motivated me to look into the implementation of TreeSet and I believe that is where the real culprit lies. While TreeSet overrides {{{}remove(){}}}, it inherits {{removeAll()}} from AbstractSet. If I am not mistaken this is the code: {code:groovy} public boolean removeAll(Collection<?> c) { Objects.requireNonNull(c); boolean modified = false; Iterator i; Object e; if (this.size() > c.size()) { for(i = c.iterator(); i.hasNext(); modified |= this.remove(e)) { e = i.next(); } } else { i = this.iterator(); while(i.hasNext()) { if (c.contains(i.next())) { i.remove(); modified = true; } } } return modified; }{code} While I understand it might be clever to iterate through less objects, the use of {{contains()}} in the {{{}else{}}}-block leads to two problems: # It is inefficient if {{this}} is a TreeSet and {{c}} is an ArrayList, because {{contains()}} is cheap for the tree, but not for the list. # Since {{contains()}} has a different meaning for TreeSet with custom comparator than {{for}} ArrayList, it can lead to different results. I guess this is one of the cases why the [doc|https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/TreeSet.html] for TreeSet recommends the {{comparator}} to be consistent with {{{}equals(){}}}, but I was'nt aware that this also applies for the elements you want to remove. To be fair, this behavoir for {{removeAll}} is well [documented|https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/util/AbstractSet.html#removeAll(java.util.Collection)], but I read that part for the first time. Here is an example for different running times {code:groovy} int n = 100_000 // This is fast, because tree.size() > list.size() and tree.remove() is fast List list1 = (1..<n).collect { it } TreeSet tree1 = (1..n).collect {it} tree1.removeAll(list1) println("fast version finished") // This is slow, because tree.size() <= list.size() and list.contains() is slow List list2 = (1..n).collect { it } TreeSet tree2 = (1..n).collect {it} tree2.removeAll(list2) println("slow version finished") {code} And here is an example for different results {code:groovy} List a = [2, 3.0, 4, 5, 6 ] List b1 = [2.0, 2, 3, 7] List b2 = [2.0, 2, 3, 7, 8] def tree1 = new TreeSet(new NumberAwareComparator()) tree1.addAll(a) tree1.removeAll(b1) def tree2 = new TreeSet(new NumberAwareComparator()) tree2.addAll(a) tree2.removeAll(b2) assert (tree1 != tree2) assert tree1.toSet() == [4, 5, 6] as Set assert tree2.toSet() == [3.0, 4, 5, 6] as Set {code} I guess I will use {{removeAll()}} with more caution in the future. > List.minus() slow for Numbers > ----------------------------- > > Key: GROOVY-10964 > URL: https://issues.apache.org/jira/browse/GROOVY-10964 > Project: Groovy > Issue Type: Question > Components: groovy-jdk > Affects Versions: 2.4.0, 3.0.9, 4.0.9 > Reporter: Ingo Wilms > Priority: Minor > Labels: Collections, Groovy, List, Number > Original Estimate: 1h > Remaining Estimate: 1h > > In List.minus() is a n*LOG\(n) version for comparable objects. Only for > numbers, there is a dedicated slower n^2*LOG\(n) version. Is there a reason > for this? It exists since 2.4.0 and hasn't changed much since then. Here is > part of the code from version 4.0.9: > > {code:java} > // if (nlgnSort && (head instanceof Comparable)) { > //n*LOG(n) version > Set<T> answer; > if (head instanceof Number) { > answer = new TreeSet<>(comparator); > answer.addAll(self1); > for (T t : self1) { > if (t instanceof Number) { > for (Object t2 : removeMe1) { > if (t2 instanceof Number) { > if (comparator.compare(t, (T) t2) == 0) > answer.remove(t); > } > } > } else { > if (removeMe1.contains(t)) > answer.remove(t); > } > } > } else { > answer = new TreeSet<>(comparator); > answer.addAll(self1); > answer.removeAll(removeMe1); > } > for (T o : self1) { > if (answer.contains(o)) > ansCollection.add(o); > } > } else { > //n*n version {code} > I fail to see why the whole extra block for numbers beginning with > {code:java} > if (head instanceof Number) { {code} > is necessary. > -- This message was sent by Atlassian Jira (v8.20.10#820010)