Hi all,
NumberAwareComparator is used in several extension methods Groovy
provides to be able to compare things, that cannot really be compared.
That involves sorting (sort, toSorted), but also things like: retainAll,
removeAll, minus on Collection, intersect and disjoint.
NumberAwareComperator will basically use the same logic as <=>, but in
case it cannot be compared, it will compare using the hashcode, though
never issuing an equals here. Of course this logic is difficult to
understand and was mainly done to be able to sort things somehow for
things you cannot really sort. Also note that <=> purely depends on
compareTo, even throwing an exception if only equals would be needed.
The first question I have here is: Do we really want to keep providing a
very difficult to understand way of comparing things, you actually
cannot compare for sorting? Meaning for example, should [new Object(),
new Object()].sort() really give a random order, or should it result in
an exception? I am not sure anymore, that we do our users a good thing
providing this operation.
And of course there are these other methods, that only need equality,
but use NumberAwareComperator.
The method with the longest history in this is most probably
Collection#minus(Collection), well the List variant actually, but the
Collection variant is where the implementation moved to.
This method actually has two variants inside. First there will be a test
for the sameType being used and then.... frankly after so many
iterations the code looks just odd to me.
//n*LOG(n) version
Set<T> answer;
if (Number.class.isInstance(head)) {
answer = new TreeSet<T>(numberComparator);
answer.addAll(self);
for (T t : self) {
if (Number.class.isInstance(t)) {
for (Object t2 : removeMe) {
if (Number.class.isInstance(t2)) {
if (numberComparator.compare(t, (T)t2) == 0)
answer.remove(t);
}
}
} else {
if (removeMe.contains(t))
answer.remove(t);
}
}
} else {
we get here after we have established, that all the elements are of the
"same" type. Either directly the same, or null, or a Number. So if the
head is a number, then all elements should be a number. "answer" is a
TreeSet with numberComparator (NumberAwareComparator). Now, since we use
the number comperator to make a TreeSet, this will mean this action
alone may already remove elements. But "answer" is not what the method
will return, it is "ansCollection", which will be build separately based
on answer later. The inner Number check is, if my assumption before is
right, surplus and the else-branch there never visited. And then we
iterate over all elements of self and all elements of removeMe to
eventually call answer.remove... That smells like n*n*logn complexity.
The statement about this being an n*logn would be wrong then.
I really wonder if something like this:
Set<T> answer;
if (head instanceof Number) {
answer = new TreeSet<T>(numberComparator);
answer.addAll(self);
Set<T> removeSet = new TreeSet<T>(numberComperator);
removeSet.addAll(removeMe);
answer.removeAll(removeSet);
}
would not be better. At least that should go near n*logn. And we do
something similiar in case head is no number:
} else {
answer = new TreeSet<T>(numberComparator);
answer.addAll(self);
answer.removeAll(removeMe);
}
Only I am wondering.... if removeMe is a Set as well, then this may not
do what we want. Imagine removeMe being another TreeSet with its own
comparator and this implementation of removeAll:
public boolean removeAll(Collection<?> c) {
Objects.requireNonNull(c);
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
This means c.contains will be called and the different comparator may be
used for it. Proof:
def toRemove = new TreeSet({a,b->-1})
toRemove.addAll(["a","b"])
def res = ["a"]*2 - toRemove
assert res == ["a","a"]
res = ["a"]*2+["b","c"] - toRemove
assert res == ["c"]
Of course this Comparator does not behave nice, but the point is more
that our comparator is not used in the first case. It depends on the
size of the Set, or better said, the size of the TreeSet. Since equal
(according to number comparator) elements appear there only once, the
size in the first case will be 1, in the second case 3, while the sizes
of the self collections are 2 and 4. Frankly I cannot really understand
why the JDK has this in this way at all, or why TreeSet does not
override it. Sure, the version they choose is better for performance,
but I find this behaviour not right, considering how removeAll works on
for example lists. Anyway... this should not be a rant about the JDK ;).
Now... does it make sense to use NumberAwareComperator here at all? If
we want to support [1.0]-[1]==[], then yes. But the gravity of this is
unclear I would say:
class MyNumber extends Number {
def n
int intValue(){n}
long longValue(){n}
float floatValue(){n}
double doubleValue(){n}
int hashCode(){-n}
boolean equals(other) {
if (other instanceof MyNumber) { return n==other.n}
return false
}
int compareTo(MyNumber other) {
return n <=> other.n
}
String toString(){"MyNumber($n)"}
}
def res = [1,new MyNumber(n:1)] - [1]
while the first loops to produce answer will correctly produce a TreeSet
containing only MyNumber, the later logic to produce ansCollection will
cause problems and nothing will be removed. ThatÄs because MyNumber(n:1)
is equal to 1, ehm, actually that is wrong. 1 is equal to MyNumber(n:1)
according to Groovy logic, but not the other way around.
assert 1 == new MyNumber(n:1)
assert new MyNumber(n:1) != 1
That is because we don't actually call Integer#compareTo. If that had
been the case, then both would not be equal. No, instead we fall back to
those number math classes, and they assume, that if you compare
something with an integer, then the other one needs to be converted to
an integer as well. And since my intValue() method here returns 1, they
are seen as equal, even though compareTo, equals and even the hashcodes
would not allow that. If I reverse the order in my list:
def res = [new MyNumber(n:1),1] - [1]
then the result will be as expected and contains only MyNumber(n:1).
Funny thing here is... I added a compareTo method, but no Comparable
interface. So MyNumber is not comparable... Does it changes things if it
does? So I add "implements Comparable<MyNumber>" to the class and now I
get the same result, regardless of the order. Which is []. That is
because now 1 is always equal to MyNumber(n:1). That is because now in
both case IntegerMath is used. Does this mean any two custom numbers are
equal, if their intValue is? So let's take MyNumber again and make a
second class just the same, including Comparable, then call it MyNumber2
assert new MyNumber2(n:1) == new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)
Well... if you did read, what I did write before, then this is no
surprise... But frankly... should it be like this? And if MyNumber2 does
not implement Comparable:
assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) == new MyNumber2(n:1)
again because of the fallback logic to IntegerMath... and if both do not
implement it:
assert new MyNumber2(n:1) != new MyNumber(n:1)
assert new MyNumber(n:1) != new MyNumber2(n:1)
Well... that is more the expected result, still... and not to
forget:This is ==, not <=>. == uses <=> only for comparable cases. So:
new MyNumber2(n:1) <=> new MyNumber(n:1)
new MyNumber(n:1) <=> new MyNumber2(n:1)
will both throw a GroovyRuntimeException, if non does implement
Comparable. If MyNumber2 implements it, the first compare works using
IntegerMath and tells us they are equal. If both implement it, they are
both equal. That is basically the same as ==, but what does
NumberAwareComperator do for such cases?
try {
return DefaultTypeTransformation.compareTo(o1, o2);
} catch (ClassCastException cce) {
/* ignore */
} catch (GroovyRuntimeException gre) {
/* ignore */
}
// since the object does not have a valid compareTo method
// we compare using the hashcodes. null cases are handled by
// DefaultTypeTransformation.compareTo
// This is not exactly a mathematical valid approach, since we compare
object
// that cannot be compared. To avoid strange side effects we do a
pseudo order
// using hashcodes, but without equality. Since then an x and y with
the same
// hashcodes will behave different depending on if we compare x with y
or
// x with y, the result might be unstable as well. Setting x and y to
equal
// may mean the removal of x or y in a sorting operation, which we
don't want.
int x1 = o1.hashCode();
int x2 = o2.hashCode();
if (x1 > x2) return 1;
return -1;
compareTo is the path for <=>, so we can expect exceptions here. So now
with MyNumber being a Comparable, and MyNumber2 not:
println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)])
println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)])
In the first two cases nothing is removed, in the third case we get an
empty list and in the last case MyNumber is removed.
Since MyNumber2 does not implement Comparable here
DefaultTypeTransformation.compareTo will throw an exception, whenever we
have MyNumber2 as o1. The compare using hashcodes will never equal.
Well. There has been one part of that minus method we have not looked
at. This is called if the first element is not Comparable, even if all
extend Number:
//n*n version
List<T> tmpAnswer = new LinkedList<T>(self);
for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();) {
T element = iter.next();
boolean elementRemoved = false;
for (Iterator<?> iterator = removeMe.iterator(); iterator.hasNext()
&& !elementRemoved;) {
Object elt = iterator.next();
if (DefaultTypeTransformation.compareEqual(element, elt)) {
iter.remove();
elementRemoved = true;
}
}
}
//remove duplicates
//can't use treeset since the base classes are different
ansCollection.addAll(tmpAnswer);
so if MyNumber2 is the head element this code will be executed instead,
since MyNumber2 not implement Comparable.
DefaultTypeTransformation.compareEqual is the code path taken for "=="
which I talked about already. Remember, IntegerMath is the fallback for
Numbers.compareEqual will prevent the exception being thrown, even
though compareTo is called internally, but only if the left hand side is
a Comparable. So here equality checks are done instead.. for example
using the equals method. That explains why ([new MyNumber2(n:1), new
MyNumber(n:1)] - [new MyNumber2(n:1)]) gets the empty list... first part
is normaly equality, second part is fallback to IntegerMath. Of course
if the code used DefaultTypeTransformation.compareEqual(elt, element),
then this would behave different, but the result would be the same. The
major difference would be that MyNumber2#equals is not called. For ([new
MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)]) we do
compareEquals with MyNumber2 and MyNumber first, resulting in the
element not being removed, so the result will be [MyNumber(n:1)]
But that did not give me the exception path... Looking at the code, with
numbers you won't get that path... or not? Well, with a subclass of
BigDecimal/BigInteger you can do that. But they implement
Comparable<BigDecimal>/Comparable<BigInteger>, so you are supposed to
make a method for that only... supposed to is not equal to people
actually doing that. Still I won't give an example of that now.
If you look at the sameType method, then it looks only at Number and
null and the exact same class. to enter the not so n*logn path and use
the number comperator. Point being here, the hashcode compares will not
normally happen for Numbers or null. And in case of all elements of both
Collections being of the same class, there should be no exception we
have the right to catch and fall back to hashcode logic. So we could
question here if NumberAwareComperator should be even used like it is
for this method.
So let's have a much shorter look at the other methods...
retainAll, intersect and disjoint: uses the comparator for all elements,
so it may use the hashcode logic. But at least they should be stable.
removeAll: same. Well, there is in theory the issue of self being a Set
and the special Set logic. But in this case, this method is not supposed
to be called.
Actually, there are two more methods using the Comperator behind the
scenes.... coercedEquals (used by several equals methods, minus and
another old friend: unique) and numberAwareCompareTo (used by
coerecedEquals and ObjectRange).
So we would have to look at those as well... but this mail got pretty
long already and I am out of time. So I put to discussion what I wrote
about for now.
bye blackdrag
--
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/