[
https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193412#comment-14193412
]
Stephan Ewen commented on FLINK-553:
------------------------------------
Here is the gist of the discussions we had on that issue in the past:
The most tricky part of this issue is how to represent the key. Since keys in
Flink can be more than one field (in tuples or POJOs), the return type of that
function is not easy to decide.
- Either it is always a tuple with all keys, making it again a bit
inconvenient for the standard case
- Or we only return one field. In case of composite keys (more than one key
field), we throw an exception or we only return the first key field
There are two ways of implementing this issue:
- Offer a "getKey()" method in the RichGroupReduceFunction and in the
RichReduceFunction. We need to do this in the rich function variants, because
the functions themselves need to be SAM interfaces (single abstract method).
- Have a version of the GroupReduceFunction with an extended signature like
"void reduceGroup(K key, Iterable<V> records, Collector<O> out)". The iterable
would still return the whole records, but the key would be added extra, for
convenience.
The first version has the disadvantage that programmers need to pick the rich
function versions, which prevents Java8 lambdas and is more clumsy in Scala.
The second variant is a bit more effort, but we have made this variations of
the functions before (see for example FlatJoinFunction vs JoinFunction)
> Add getGroupKey() method to group-at-time operators
> ---------------------------------------------------
>
> Key: FLINK-553
> URL: https://issues.apache.org/jira/browse/FLINK-553
> Project: Flink
> Issue Type: Improvement
> Components: Java API, Scala API
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one
> UDF call. Often these UDFs need to access the key that is common to all
> records of a group.
> We could add a function to set a the key of a group before the UDF is called
> (``setGroupKey()``) and a function to get the key (``getGroupKey()``) that
> can be called from the UDF.
> What do you think about this?
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction,
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)