usage of NumberAwareComparator in extension methods and number oddities (a bit code review)

Jochen Theodorou Thu, 03 Sep 2015 04:45:07 -0700

Hi all,

NumberAwareComparator is used in several extension methods Groovyprovides to be able to compare things, that cannot really be compared.That involves sorting (sort, toSorted), but also things like: retainAll,removeAll, minus on Collection, intersect and disjoint.NumberAwareComperator will basically use the same logic as <=>, but incase it cannot be compared, it will compare using the hashcode, thoughnever issuing an equals here. Of course this logic is difficult tounderstand and was mainly done to be able to sort things somehow forthings you cannot really sort. Also note that <=> purely depends oncompareTo, even throwing an exception if only equals would be needed.

The first question I have here is: Do we really want to keep providing avery difficult to understand way of comparing things, you actuallycannot compare for sorting? Meaning for example, should [new Object(),new Object()].sort() really give a random order, or should it result inan exception? I am not sure anymore, that we do our users a good thingproviding this operation.

And of course there are these other methods, that only need equality,but use NumberAwareComperator.

The method with the longest history in this is most probablyCollection#minus(Collection), well the List variant actually, but theCollection variant is where the implementation moved to.

This method actually has two variants inside. First there will be a testfor the sameType being used and then.... frankly after so manyiterations the code looks just odd to me.

            //n*LOG(n) version
            Set<T> answer;
            if (Number.class.isInstance(head)) {
                answer = new TreeSet<T>(numberComparator);
                answer.addAll(self);
                for (T t : self) {
                    if (Number.class.isInstance(t)) {
                        for (Object t2 : removeMe) {
                            if (Number.class.isInstance(t2)) {
                                if (numberComparator.compare(t, (T)t2) == 0)
                                    answer.remove(t);
                            }
                        }
                    } else {
                        if (removeMe.contains(t))
                            answer.remove(t);
                    }
                }
            } else {

we get here after we have established, that all the elements are of the"same" type. Either directly the same, or null, or a Number. So if thehead is a number, then all elements should be a number. "answer" is aTreeSet with numberComparator (NumberAwareComparator). Now, since we usethe number comperator to make a TreeSet, this will mean this actionalone may already remove elements. But "answer" is not what the methodwill return, it is "ansCollection", which will be build separately basedon answer later. The inner Number check is, if my assumption before isright, surplus and the else-branch there never visited. And then weiterate over all elements of self and all elements of removeMe toeventually call answer.remove... That smells like n*n*logn complexity.The statement about this being an n*logn would be wrong then.


I really wonder if something like this:

Set<T> answer;
if (head instanceof Number) {
  answer = new TreeSet<T>(numberComparator);
  answer.addAll(self);
  Set<T> removeSet = new TreeSet<T>(numberComperator);
  removeSet.addAll(removeMe);
  answer.removeAll(removeSet);
}

would not be better. At least that should go near n*logn. And we dosomething similiar in case head is no number:

            } else {
                answer = new TreeSet<T>(numberComparator);
                answer.addAll(self);
                answer.removeAll(removeMe);
            }

Only I am wondering.... if removeMe is a Set as well, then this may notdo what we want. Imagine removeMe being another TreeSet with its owncomparator and this implementation of removeAll:

    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        boolean modified = false;

        if (size() > c.size()) {
            for (Iterator<?> i = c.iterator(); i.hasNext(); )
                modified |= remove(i.next());
        } else {
            for (Iterator<?> i = iterator(); i.hasNext(); ) {
                if (c.contains(i.next())) {
                    i.remove();
                    modified = true;
                }
            }
        }
        return modified;
    }

This means c.contains will be called and the different comparator may beused for it. Proof:

def toRemove = new TreeSet({a,b->-1})
toRemove.addAll(["a","b"])
def res = ["a"]*2 - toRemove
assert res == ["a","a"]

res = ["a"]*2+["b","c"] - toRemove
assert res == ["c"]

Of course this Comparator does not behave nice, but the point is morethat our comparator is not used in the first case. It depends on thesize of the Set, or better said, the size of the TreeSet. Since equal(according to number comparator) elements appear there only once, thesize in the first case will be 1, in the second case 3, while the sizesof the self collections are 2 and 4. Frankly I cannot really understandwhy the JDK has this in this way at all, or why TreeSet does notoverride it. Sure, the version they choose is better for performance,but I find this behaviour not right, considering how removeAll works onfor example lists. Anyway... this should not be a rant about the JDK ;).Now... does it make sense to use NumberAwareComperator here at all? Ifwe want to support [1.0]-[1]==[], then yes. But the gravity of this isunclear I would say:

class MyNumber extends Number {
    def n
    int intValue(){n}
    long longValue(){n}
    float floatValue(){n}
    double doubleValue(){n}
    int hashCode(){-n}
    boolean equals(other) {
        if (other instanceof MyNumber) { return n==other.n}
        return false
    }
    int compareTo(MyNumber other) {
        return n <=> other.n
    }
    String toString(){"MyNumber($n)"}
}

def res = [1,new MyNumber(n:1)] - [1]

while the first loops to produce answer will correctly produce a TreeSetcontaining only MyNumber, the later logic to produce ansCollection willcause problems and nothing will be removed. ThatÄs because MyNumber(n:1)is equal to 1, ehm, actually that is wrong. 1 is equal to MyNumber(n:1)according to Groovy logic, but not the other way around.


assert 1 == new MyNumber(n:1)
assert new MyNumber(n:1) != 1

That is because we don't actually call Integer#compareTo. If that hadbeen the case, then both would not be equal. No, instead we fall back tothose number math classes, and they assume, that if you comparesomething with an integer, then the other one needs to be converted toan integer as well. And since my intValue() method here returns 1, theyare seen as equal, even though compareTo, equals and even the hashcodeswould not allow that. If I reverse the order in my list:

def res = [new MyNumber(n:1),1] - [1]

then the result will be as expected and contains only MyNumber(n:1).Funny thing here is... I added a compareTo method, but no Comparableinterface. So MyNumber is not comparable... Does it changes things if itdoes? So I add "implements Comparable<MyNumber>" to the class and now Iget the same result, regardless of the order. Which is []. That isbecause now 1 is always equal to MyNumber(n:1). That is because now inboth case IntegerMath is used. Does this mean any two custom numbers areequal, if their intValue is? So let's take MyNumber again and make asecond class just the same, including Comparable, then call it MyNumber2

assert new MyNumber2(n:1) == new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)

Well... if you did read, what I did write before, then this is nosurprise... But frankly... should it be like this? And if MyNumber2 doesnot implement Comparable:

assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)

again because of the fallback logic to IntegerMath... and if both do notimplement it:

assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) != new MyNumber2(n:1)

Well... that is more the expected result, still... and not toforget:This is ==, not <=>. == uses <=> only for comparable cases. So:

new MyNumber2(n:1) <=> new MyNumber(n:1)
new MyNumber(n:1) <=> new MyNumber2(n:1)

will both throw a GroovyRuntimeException, if non does implementComparable. If MyNumber2 implements it, the first compare works usingIntegerMath and tells us they are equal. If both implement it, they areboth equal. That is basically the same as ==, but what doesNumberAwareComperator do for such cases?

        try {
            return DefaultTypeTransformation.compareTo(o1, o2);
        } catch (ClassCastException cce) {
            /* ignore */
        } catch (GroovyRuntimeException gre) {
            /* ignore */
        }
        // since the object does not have a valid compareTo method
        // we compare using the hashcodes. null cases are handled by
        // DefaultTypeTransformation.compareTo
        // This is not exactly a mathematical valid approach, since we compare 
object
        // that cannot be compared. To avoid strange side effects we do a 
pseudo order
        // using hashcodes, but without equality. Since then an x and y with 
the same
        // hashcodes will behave different depending on if we compare x with y 
or
        // x with y, the result might be unstable as well. Setting x and y to 
equal
        // may mean the removal of x or y in a sorting operation, which we 
don't want.
        int x1 = o1.hashCode();
        int x2 = o2.hashCode();
        if (x1 > x2) return 1;
        return -1;

compareTo is the path for <=>, so we can expect exceptions here. So nowwith MyNumber being a Comparable, and MyNumber2 not:

println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)])

In the first two cases nothing is removed, in the third case we get anempty list and in the last case MyNumber is removed.Since MyNumber2 does not implement Comparable hereDefaultTypeTransformation.compareTo will throw an exception, whenever wehave MyNumber2 as o1. The compare using hashcodes will never equal.Well. There has been one part of that minus method we have not lookedat. This is called if the first element is not Comparable, even if allextend Number:

            //n*n version
            List<T> tmpAnswer = new LinkedList<T>(self);
            for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();) {
                T element = iter.next();
                boolean elementRemoved = false;
                for (Iterator<?> iterator = removeMe.iterator(); iterator.hasNext() 
&& !elementRemoved;) {
                    Object elt = iterator.next();
                    if (DefaultTypeTransformation.compareEqual(element, elt)) {
                        iter.remove();
                        elementRemoved = true;
                    }
                }
            }

            //remove duplicates
            //can't use treeset since the base classes are different
            ansCollection.addAll(tmpAnswer);

so if MyNumber2 is the head element this code will be executed instead,since MyNumber2 not implement Comparable.DefaultTypeTransformation.compareEqual is the code path taken for "=="which I talked about already. Remember, IntegerMath is the fallback forNumbers.compareEqual will prevent the exception being thrown, eventhough compareTo is called internally, but only if the left hand side isa Comparable. So here equality checks are done instead.. for exampleusing the equals method. That explains why ([new MyNumber2(n:1), newMyNumber(n:1)] - [new MyNumber2(n:1)]) gets the empty list... first partis normaly equality, second part is fallback to IntegerMath. Of courseif the code used DefaultTypeTransformation.compareEqual(elt, element),then this would behave different, but the result would be the same. Themajor difference would be that MyNumber2#equals is not called. For ([newMyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)]) we docompareEquals with MyNumber2 and MyNumber first, resulting in theelement not being removed, so the result will be [MyNumber(n:1)]

But that did not give me the exception path... Looking at the code, withnumbers you won't get that path... or not? Well, with a subclass ofBigDecimal/BigInteger you can do that. But they implementComparable<BigDecimal>/Comparable<BigInteger>, so you are supposed tomake a method for that only... supposed to is not equal to peopleactually doing that. Still I won't give an example of that now.

If you look at the sameType method, then it looks only at Number andnull and the exact same class. to enter the not so n*logn path and usethe number comperator. Point being here, the hashcode compares will notnormally happen for Numbers or null. And in case of all elements of bothCollections being of the same class, there should be no exception wehave the right to catch and fall back to hashcode logic. So we couldquestion here if NumberAwareComperator should be even used like it isfor this method.


So let's have a much shorter look at the other methods...

retainAll, intersect and disjoint: uses the comparator for all elements,so it may use the hashcode logic. But at least they should be stable.removeAll: same. Well, there is in theory the issue of self being a Setand the special Set logic. But in this case, this method is not supposedto be called.

Actually, there are two more methods using the Comperator behind thescenes.... coercedEquals (used by several equals methods, minus andanother old friend: unique) and numberAwareCompareTo (used bycoerecedEquals and ObjectRange).

So we would have to look at those as well... but this mail got prettylong already and I am out of time. So I put to discussion what I wroteabout for now.


bye blackdrag

--
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/

usage of NumberAwareComparator in extension methods and number oddities (a bit code review)

Reply via email to