[
https://issues.apache.org/jira/browse/FLINK-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045743#comment-14045743
]
Tobias commented on FLINK-925:
------------------------------
DataSet implements:
public Grouping<T> groupBy(int... fields) {
return new Grouping<T>(this, new
Keys.FieldPositionKeys<T>(fields, getType(), false));
}
That can be used to group not comparable Tuple data types. Those Tuples need
consist of non generic comparable types.
When I group on my comparable:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(0)*
This exception is thrown:
{color:red}
Exception in thread "main" java.lang.UnsupportedOperationException: Generic
type comparators are not yet implemented.
at
eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66)
{color}
When I group on the Integer:
*DataSet<Tuple2<MyComparable, Integer>>.groupBy(1)*
{color:red}
Exception in thread "main" eu.stratosphere.compiler.CompilerException: Error
translating node 'GroupReduce "MAX(1)" : SORTED_GROUP_REDUCE [[
GlobalProperties [partitioning=RANDOM] ]] [[ LocalProperties [ordering=null,
grouped=null, unique=null] ]]': Could not serialize comparator into the
configuration.
{color}
Grouping with: *class MyComparable implements Comparable<MyComparable>*
{color:red}Exception in thread "main" java.lang.UnsupportedOperationException:
Generic type comparators are not yet implemented.
at
eu.stratosphere.api.java.typeutils.GenericTypeInfo.createComparator(GenericTypeInfo.java:66){color}
I did those test in order to understand the problem. As far as I understand:
-> Tuple data types can be grouped when they contain non generic types
-> All other generic types are not group-able. In a Tuple or not.
-> Tuples which contain one generic type are not group-able independent on the
KEY used for grouping
Does it make sense to remove the Comparable restriction? Because even some
classes which do fulfill that restriction are not supported?!
And Tuple can be grouped if they consist of the right types.
> Support KeySelector function returning Tuples
> ---------------------------------------------
>
> Key: FLINK-925
> URL: https://issues.apache.org/jira/browse/FLINK-925
> Project: Flink
> Issue Type: Improvement
> Affects Versions: 0.6-incubating
> Reporter: Fabian Hueske
> Assignee: Tobias
> Priority: Minor
> Labels: starter
>
> KeySelector functions are used to extract keys on which DataSets can be
> grouped or joined.
> Currently, the keys types returned by KeySelector function are restricted to
> be comparable. However, Flinks Tuple data types are not comparable (because
> this depends on the types of its fields) which makes grouping and joining on
> composite keys difficult.
> We should change the signature of the groupBy(), join(), and coGroup()
> methods to allow also non-comparable keys as return types of a KeySelector
> function.
> Instead we will check at optimization time whether the returned type is
> comparable (which is true for tuples if all elements are comparable).
--
This message was sent by Atlassian JIRA
(v6.2#6252)