Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:38, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

>
>
> On Fri, 22 Feb 2019 at 10:16, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> On Thu, 21 Feb 2019 at 18:29, Frank Grimes 
>> wrote:
>>
>>> Hi,
>>>
>>> I've recently started to evaluate Flink and have found it odd that its
>>> Tuple types, while Serializable, don't implement java.lang.Comparable.
>>> This means that I either need to provide an KeySelector for many
>>> operations or subtype the Tuple types and provide my own implementation of
>>> compareTo for each.
>>>
>>> Is there a technical reason why this was omitted?
>>>
>>
>> A Tuple1 would need to have an implementation of the comparable
>> contract in order to be comparable. The only way I can see of writing that
>> would be if we had:
>>
>> Tuple1>
>>
>> Similarly you'd have
>>
>> Tuple25, T1 extends Comparable> T1>, T2 extends Comparable, ..., T24 extends Comparable> T24>>
>>
>> Which is a lot less screen friendly than say
>>
>> Tuple25
>>
>> So from a purely code readability PoV you would need to at least replace
>> the  Comparable with Comparable with the immediate reduction
>> of utility... or perhaps you would remove the generics entirely and go for
>> just Comparable and have the implementation just follow the guarantee...
>> except javac will error out for the erasure.
>>
>> Now jOOQ just ignores the comparability contract on the generic types,
>> e.g.
>> https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35
>>
>> The result is that when asked to compare you get a cast that will work or
>> fail:
>> https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31
>>
>> So jOOQ is following the "fail at runtime" mode... you can make tuples of
>> non-comparable objects and only at runtime if you try to compare them will
>> it blow up.
>>
>> Flink is following the "fail at compile time" mode... if you try to
>> compare a tuple, you need to provide a way to compare them.
>>
>> Flink does a lot of work - from what I can see - to fail early and fast.
>> For example, if type inference fails on the dataflow plan, it will error
>> out fast. I would hate to have a Tuple3 where A and B are both
>> Comparable and Y is not... my tests happened to have a small subset of A
>> and B values that never resulted in a comparison of Y values... so the
>> tests didn't fail... but now on the production environment... at runtime...
>> 1 month later in the middle of my vacation I get a call because the
>> topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))
>>
>> So my analysis would say it is not a technical reason but rather a
>> principal reason of "fail fast"
>>
>
> Idea: you could fake comparable for things that are not comparable.
>
> We know the two objects are serializable, so we could just do a byte by
> byte comparison of their serialized representations if they do not
> implement Comparable... would be slower, would probably not be the
> comparison result that people expect... but it would be consistent and not
> error out.
>
> public int compare(Serializable a,Serializable b) {
>   byte[] b1;
>   try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutputStream oos = new ObjectOutputStream(bos)) {
> oos.writeObject(a);
> b1 = bos.toByteArray();
>   }
>   byte[] b2;
>   try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
> ObjectOutputStream oos = new ObjectOutputStream(bos)) {
> oos.writeObject(a);
> b2 = bos.toByteArray();
>   }
>   for (int i = 0; i < Math.min(b1.length,b2.length); i++) {
> int r = Byte.compare(b1[i],b2[i]);
> if (r != 0) {
>   return r;
> }
>   }
>   return Integer.compare(b1.length,b2.length);
> }
>

Hmmm there is nothing stopping you from having the above as the basis for a
utility universal comparison method that you would just pass by method
handle... or if you want the jOOQ behaviour just go with

public class TupleComparator implements Comparator {
  ...
  // do the class cast here and let it blow up at runtime
}


>
> is an example of the kind of algorithm. The idea being that all you really
> want is a consistent comparison because they do not implement Comparable...
> if somebody had a useful comparison contract for objects of that type then
> presumably they would already have implemented the comparability contract.
>
> Another approach that could work would be to use Tuple as a builder for
> ComparableTuple by giving each one a comparable method
>
> public class Tuple2 ... {
>   ...
>   public ComparableTuple2 comparable(Comparator c0,
> Comparator c1) {
> ...
>   }
>   ...
> }
>
> public class ComparableTuple2 extends ComparableTuple {
>   // notice no bounds on the generic type arguments
>   ...
>   // naïve storage (reduces GC pressure as the builder is not wasted)
>   // would need analysis to determine if better to allow the builder
> allocation
>   // to be 

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:16, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

> On Thu, 21 Feb 2019 at 18:29, Frank Grimes 
> wrote:
>
>> Hi,
>>
>> I've recently started to evaluate Flink and have found it odd that its
>> Tuple types, while Serializable, don't implement java.lang.Comparable.
>> This means that I either need to provide an KeySelector for many
>> operations or subtype the Tuple types and provide my own implementation of
>> compareTo for each.
>>
>> Is there a technical reason why this was omitted?
>>
>
> A Tuple1 would need to have an implementation of the comparable
> contract in order to be comparable. The only way I can see of writing that
> would be if we had:
>
> Tuple1>
>
> Similarly you'd have
>
> Tuple25, T1 extends Comparable T1>, T2 extends Comparable, ..., T24 extends Comparable T24>>
>
> Which is a lot less screen friendly than say
>
> Tuple25
>
> So from a purely code readability PoV you would need to at least replace
> the  Comparable with Comparable with the immediate reduction
> of utility... or perhaps you would remove the generics entirely and go for
> just Comparable and have the implementation just follow the guarantee...
> except javac will error out for the erasure.
>
> Now jOOQ just ignores the comparability contract on the generic types,
> e.g.
> https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35
>
> The result is that when asked to compare you get a cast that will work or
> fail:
> https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31
>
> So jOOQ is following the "fail at runtime" mode... you can make tuples of
> non-comparable objects and only at runtime if you try to compare them will
> it blow up.
>
> Flink is following the "fail at compile time" mode... if you try to
> compare a tuple, you need to provide a way to compare them.
>
> Flink does a lot of work - from what I can see - to fail early and fast.
> For example, if type inference fails on the dataflow plan, it will error
> out fast. I would hate to have a Tuple3 where A and B are both
> Comparable and Y is not... my tests happened to have a small subset of A
> and B values that never resulted in a comparison of Y values... so the
> tests didn't fail... but now on the production environment... at runtime...
> 1 month later in the middle of my vacation I get a call because the
> topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))
>
> So my analysis would say it is not a technical reason but rather a
> principal reason of "fail fast"
>

Idea: you could fake comparable for things that are not comparable.

We know the two objects are serializable, so we could just do a byte by
byte comparison of their serialized representations if they do not
implement Comparable... would be slower, would probably not be the
comparison result that people expect... but it would be consistent and not
error out.

public int compare(Serializable a,Serializable b) {
  byte[] b1;
  try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bos)) {
oos.writeObject(a);
b1 = bos.toByteArray();
  }
  byte[] b2;
  try (ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(bos)) {
oos.writeObject(a);
b2 = bos.toByteArray();
  }
  for (int i = 0; i < Math.min(b1.length,b2.length); i++) {
int r = Byte.compare(b1[i],b2[i]);
if (r != 0) {
  return r;
}
  }
  return Integer.compare(b1.length,b2.length);
}

is an example of the kind of algorithm. The idea being that all you really
want is a consistent comparison because they do not implement Comparable...
if somebody had a useful comparison contract for objects of that type then
presumably they would already have implemented the comparability contract.

Another approach that could work would be to use Tuple as a builder for
ComparableTuple by giving each one a comparable method

public class Tuple2 ... {
  ...
  public ComparableTuple2 comparable(Comparator c0,
Comparator c1) {
...
  }
  ...
}

public class ComparableTuple2 extends ComparableTuple {
  // notice no bounds on the generic type arguments
  ...
  // naïve storage (reduces GC pressure as the builder is not wasted)
  // would need analysis to determine if better to allow the builder
allocation
  // to be elided by JVM and store the values directly as f0 and f1
  private final Tuple2 tuple;
  private final Comparator c0;
  private final Comparator> c1;

  ...
}

That way you get ComparableTuple2 can ignore the type bounds on its generic
arguments because the instantiation path guarantees a comparison can be
made.

ComparableTuple2 t =
Tuple2.of(a,b).comparable(User::byLastName,Page::byLastAcccessed)

That seems very nice and readable...


>

>
>
>>
>> For example, the JOOQ/JOOL Tuple types all implement Comparable:
>>
>> 

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 18:29, Frank Grimes  wrote:

> Hi,
>
> I've recently started to evaluate Flink and have found it odd that its
> Tuple types, while Serializable, don't implement java.lang.Comparable.
> This means that I either need to provide an KeySelector for many
> operations or subtype the Tuple types and provide my own implementation of
> compareTo for each.
>
> Is there a technical reason why this was omitted?
>

A Tuple1 would need to have an implementation of the comparable
contract in order to be comparable. The only way I can see of writing that
would be if we had:

Tuple1>

Similarly you'd have

Tuple25, T1 extends Comparable, T2 extends Comparable, ..., T24 extends Comparable>

Which is a lot less screen friendly than say

Tuple25

So from a purely code readability PoV you would need to at least replace
the  Comparable with Comparable with the immediate reduction
of utility... or perhaps you would remove the generics entirely and go for
just Comparable and have the implementation just follow the guarantee...
except javac will error out for the erasure.

Now jOOQ just ignores the comparability contract on the generic types, e.g.
https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35

The result is that when asked to compare you get a cast that will work or
fail:
https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31

So jOOQ is following the "fail at runtime" mode... you can make tuples of
non-comparable objects and only at runtime if you try to compare them will
it blow up.

Flink is following the "fail at compile time" mode... if you try to compare
a tuple, you need to provide a way to compare them.

Flink does a lot of work - from what I can see - to fail early and fast.
For example, if type inference fails on the dataflow plan, it will error
out fast. I would hate to have a Tuple3 where A and B are both
Comparable and Y is not... my tests happened to have a small subset of A
and B values that never resulted in a comparison of Y values... so the
tests didn't fail... but now on the production environment... at runtime...
1 month later in the middle of my vacation I get a call because the
topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))

So my analysis would say it is not a technical reason but rather a
principal reason of "fail fast"


>
> For example, the JOOQ/JOOL Tuple types all implement Comparable:
>
> https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39
>
> As an aside, I tried replacing usage of Flink's Tuple types with the JOOL
> ones but they caused a StackOverflowError similar to this one:
> https://issues.apache.org/jira/browse/FLINK-3922
>
> Thanks,
>
> Frank Grimes
>
>


Why don't Tuple types implement Comparable?

2019-02-21 Thread Frank Grimes
Hi,
I've recently started to evaluate Flink and have found it odd that its Tuple 
types, while Serializable, don't implement java.lang.Comparable.This means that 
I either need to provide an KeySelector for many operations or subtype the 
Tuple types and provide my own implementation of compareTo for each.
Is there a technical reason why this was omitted?
For example, the JOOQ/JOOL Tuple types all implement 
Comparable:https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39
As an aside, I tried replacing usage of Flink's Tuple types with the JOOL ones 
but they caused a StackOverflowError similar to this 
one:https://issues.apache.org/jira/browse/FLINK-3922

Thanks,
Frank Grimes