Hi all,

NumberAwareComparator is used in several extension methods Groovy provides to be able to compare things, that cannot really be compared. That involves sorting (sort, toSorted), but also things like: retainAll, removeAll, minus on Collection, intersect and disjoint. NumberAwareComperator will basically use the same logic as <=>, but in case it cannot be compared, it will compare using the hashcode, though never issuing an equals here. Of course this logic is difficult to understand and was mainly done to be able to sort things somehow for things you cannot really sort. Also note that <=> purely depends on compareTo, even throwing an exception if only equals would be needed.

The first question I have here is: Do we really want to keep providing a very difficult to understand way of comparing things, you actually cannot compare for sorting? Meaning for example, should [new Object(), new Object()].sort() really give a random order, or should it result in an exception? I am not sure anymore, that we do our users a good thing providing this operation.

And of course there are these other methods, that only need equality, but use NumberAwareComperator.

The method with the longest history in this is most probably Collection#minus(Collection), well the List variant actually, but the Collection variant is where the implementation moved to.

This method actually has two variants inside. First there will be a test for the sameType being used and then.... frankly after so many iterations the code looks just odd to me.

            //n*LOG(n) version
            Set<T> answer;
            if (Number.class.isInstance(head)) {
                answer = new TreeSet<T>(numberComparator);
                answer.addAll(self);
                for (T t : self) {
                    if (Number.class.isInstance(t)) {
                        for (Object t2 : removeMe) {
                            if (Number.class.isInstance(t2)) {
                                if (numberComparator.compare(t, (T)t2) == 0)
                                    answer.remove(t);
                            }
                        }
                    } else {
                        if (removeMe.contains(t))
                            answer.remove(t);
                    }
                }
            } else {

we get here after we have established, that all the elements are of the "same" type. Either directly the same, or null, or a Number. So if the head is a number, then all elements should be a number. "answer" is a TreeSet with numberComparator (NumberAwareComparator). Now, since we use the number comperator to make a TreeSet, this will mean this action alone may already remove elements. But "answer" is not what the method will return, it is "ansCollection", which will be build separately based on answer later. The inner Number check is, if my assumption before is right, surplus and the else-branch there never visited. And then we iterate over all elements of self and all elements of removeMe to eventually call answer.remove... That smells like n*n*logn complexity. The statement about this being an n*logn would be wrong then.

I really wonder if something like this:

Set<T> answer;
if (head instanceof Number) {
  answer = new TreeSet<T>(numberComparator);
  answer.addAll(self);
  Set<T> removeSet = new TreeSet<T>(numberComperator);
  removeSet.addAll(removeMe);
  answer.removeAll(removeSet);
}

would not be better. At least that should go near n*logn. And we do something similiar in case head is no number:

            } else {
                answer = new TreeSet<T>(numberComparator);
                answer.addAll(self);
                answer.removeAll(removeMe);
            }

Only I am wondering.... if removeMe is a Set as well, then this may not do what we want. Imagine removeMe being another TreeSet with its own comparator and this implementation of removeAll:

    public boolean removeAll(Collection<?> c) {
        Objects.requireNonNull(c);
        boolean modified = false;

        if (size() > c.size()) {
            for (Iterator<?> i = c.iterator(); i.hasNext(); )
                modified |= remove(i.next());
        } else {
            for (Iterator<?> i = iterator(); i.hasNext(); ) {
                if (c.contains(i.next())) {
                    i.remove();
                    modified = true;
                }
            }
        }
        return modified;
    }

This means c.contains will be called and the different comparator may be used for it. Proof:

def toRemove = new TreeSet({a,b->-1})
toRemove.addAll(["a","b"])
def res = ["a"]*2 - toRemove
assert res == ["a","a"]

res = ["a"]*2+["b","c"] - toRemove
assert res == ["c"]

Of course this Comparator does not behave nice, but the point is more that our comparator is not used in the first case. It depends on the size of the Set, or better said, the size of the TreeSet. Since equal (according to number comparator) elements appear there only once, the size in the first case will be 1, in the second case 3, while the sizes of the self collections are 2 and 4. Frankly I cannot really understand why the JDK has this in this way at all, or why TreeSet does not override it. Sure, the version they choose is better for performance, but I find this behaviour not right, considering how removeAll works on for example lists. Anyway... this should not be a rant about the JDK ;). Now... does it make sense to use NumberAwareComperator here at all? If we want to support [1.0]-[1]==[], then yes. But the gravity of this is unclear I would say:

class MyNumber extends Number {
    def n
    int intValue(){n}
    long longValue(){n}
    float floatValue(){n}
    double doubleValue(){n}
    int hashCode(){-n}
    boolean equals(other) {
        if (other instanceof MyNumber) { return n==other.n}
        return false
    }
    int compareTo(MyNumber other) {
        return n <=> other.n
    }
    String toString(){"MyNumber($n)"}
}

def res = [1,new MyNumber(n:1)] - [1]

while the first loops to produce answer will correctly produce a TreeSet containing only MyNumber, the later logic to produce ansCollection will cause problems and nothing will be removed. ThatÄs because MyNumber(n:1) is equal to 1, ehm, actually that is wrong. 1 is equal to MyNumber(n:1) according to Groovy logic, but not the other way around.

assert 1 == new MyNumber(n:1)
assert new MyNumber(n:1) != 1

That is because we don't actually call Integer#compareTo. If that had been the case, then both would not be equal. No, instead we fall back to those number math classes, and they assume, that if you compare something with an integer, then the other one needs to be converted to an integer as well. And since my intValue() method here returns 1, they are seen as equal, even though compareTo, equals and even the hashcodes would not allow that. If I reverse the order in my list:

def res = [new MyNumber(n:1),1] - [1]

then the result will be as expected and contains only MyNumber(n:1). Funny thing here is... I added a compareTo method, but no Comparable interface. So MyNumber is not comparable... Does it changes things if it does? So I add "implements Comparable<MyNumber>" to the class and now I get the same result, regardless of the order. Which is []. That is because now 1 is always equal to MyNumber(n:1). That is because now in both case IntegerMath is used. Does this mean any two custom numbers are equal, if their intValue is? So let's take MyNumber again and make a second class just the same, including Comparable, then call it MyNumber2

assert new MyNumber2(n:1) == new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)

Well... if you did read, what I did write before, then this is no surprise... But frankly... should it be like this? And if MyNumber2 does not implement Comparable:

assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)

again because of the fallback logic to IntegerMath... and if both do not implement it:

assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) != new MyNumber2(n:1)

Well... that is more the expected result, still... and not to forget:This is ==, not <=>. == uses <=> only for comparable cases. So:

new MyNumber2(n:1) <=> new MyNumber(n:1)
new MyNumber(n:1) <=> new MyNumber2(n:1)

will both throw a GroovyRuntimeException, if non does implement Comparable. If MyNumber2 implements it, the first compare works using IntegerMath and tells us they are equal. If both implement it, they are both equal. That is basically the same as ==, but what does NumberAwareComperator do for such cases?

        try {
            return DefaultTypeTransformation.compareTo(o1, o2);
        } catch (ClassCastException cce) {
            /* ignore */
        } catch (GroovyRuntimeException gre) {
            /* ignore */
        }
        // since the object does not have a valid compareTo method
        // we compare using the hashcodes. null cases are handled by
        // DefaultTypeTransformation.compareTo
        // This is not exactly a mathematical valid approach, since we compare 
object
        // that cannot be compared. To avoid strange side effects we do a 
pseudo order
        // using hashcodes, but without equality. Since then an x and y with 
the same
        // hashcodes will behave different depending on if we compare x with y 
or
        // x with y, the result might be unstable as well. Setting x and y to 
equal
        // may mean the removal of x or y in a sorting operation, which we 
don't want.
        int x1 = o1.hashCode();
        int x2 = o2.hashCode();
        if (x1 > x2) return 1;
        return -1;

compareTo is the path for <=>, so we can expect exceptions here. So now with MyNumber being a Comparable, and MyNumber2 not:

println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)])

In the first two cases nothing is removed, in the third case we get an empty list and in the last case MyNumber is removed. Since MyNumber2 does not implement Comparable here DefaultTypeTransformation.compareTo will throw an exception, whenever we have MyNumber2 as o1. The compare using hashcodes will never equal. Well. There has been one part of that minus method we have not looked at. This is called if the first element is not Comparable, even if all extend Number:

            //n*n version
            List<T> tmpAnswer = new LinkedList<T>(self);
            for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();) {
                T element = iter.next();
                boolean elementRemoved = false;
                for (Iterator<?> iterator = removeMe.iterator(); iterator.hasNext() 
&& !elementRemoved;) {
                    Object elt = iterator.next();
                    if (DefaultTypeTransformation.compareEqual(element, elt)) {
                        iter.remove();
                        elementRemoved = true;
                    }
                }
            }

            //remove duplicates
            //can't use treeset since the base classes are different
            ansCollection.addAll(tmpAnswer);

so if MyNumber2 is the head element this code will be executed instead, since MyNumber2 not implement Comparable. DefaultTypeTransformation.compareEqual is the code path taken for "==" which I talked about already. Remember, IntegerMath is the fallback for Numbers.compareEqual will prevent the exception being thrown, even though compareTo is called internally, but only if the left hand side is a Comparable. So here equality checks are done instead.. for example using the equals method. That explains why ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)]) gets the empty list... first part is normaly equality, second part is fallback to IntegerMath. Of course if the code used DefaultTypeTransformation.compareEqual(elt, element), then this would behave different, but the result would be the same. The major difference would be that MyNumber2#equals is not called. For ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)]) we do compareEquals with MyNumber2 and MyNumber first, resulting in the element not being removed, so the result will be [MyNumber(n:1)]

But that did not give me the exception path... Looking at the code, with numbers you won't get that path... or not? Well, with a subclass of BigDecimal/BigInteger you can do that. But they implement Comparable<BigDecimal>/Comparable<BigInteger>, so you are supposed to make a method for that only... supposed to is not equal to people actually doing that. Still I won't give an example of that now.

If you look at the sameType method, then it looks only at Number and null and the exact same class. to enter the not so n*logn path and use the number comperator. Point being here, the hashcode compares will not normally happen for Numbers or null. And in case of all elements of both Collections being of the same class, there should be no exception we have the right to catch and fall back to hashcode logic. So we could question here if NumberAwareComperator should be even used like it is for this method.

So let's have a much shorter look at the other methods...

retainAll, intersect and disjoint: uses the comparator for all elements, so it may use the hashcode logic. But at least they should be stable. removeAll: same. Well, there is in theory the issue of self being a Set and the special Set logic. But in this case, this method is not supposed to be called.


Actually, there are two more methods using the Comperator behind the scenes.... coercedEquals (used by several equals methods, minus and another old friend: unique) and numberAwareCompareTo (used by coerecedEquals and ObjectRange).

So we would have to look at those as well... but this mail got pretty long already and I am out of time. So I put to discussion what I wrote about for now.

bye blackdrag

--
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/

Reply via email to