[ 
https://issues.apache.org/jira/browse/FLINK-553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193412#comment-14193412
 ] 

Stephan Ewen commented on FLINK-553:
------------------------------------

Here is the gist of the discussions we had on that issue in the past:

The most tricky part of this issue is how to represent the key. Since keys in 
Flink can be more than one field (in tuples or POJOs), the return type of that 
function is not easy to decide.
  - Either it is always a tuple with all keys, making it again a bit 
inconvenient for the standard case
  - Or we only return one field. In case of composite keys (more than one key 
field), we throw an exception or we only return the first key field

There are two ways of implementing this issue:
  - Offer a "getKey()" method in the RichGroupReduceFunction and in the 
RichReduceFunction. We need to do this in the rich function variants, because 
the functions themselves need to be SAM interfaces (single abstract method).
  - Have a version of the GroupReduceFunction with an extended signature like 
"void reduceGroup(K key, Iterable<V> records, Collector<O> out)". The iterable 
would still return the whole records, but the key would be added extra, for 
convenience.

The first version has the disadvantage that programmers need to pick the rich 
function versions, which prevents Java8 lambdas and is more clumsy in Scala. 
The second variant is a bit more effort, but we have made this variations of 
the functions before (see for example FlatJoinFunction vs JoinFunction)





> Add getGroupKey() method to group-at-time operators
> ---------------------------------------------------
>
>                 Key: FLINK-553
>                 URL: https://issues.apache.org/jira/browse/FLINK-553
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API, Scala API
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> Group-at-a-time operators (Reduce & CoGroup) work on multiple records in one 
> UDF call. Often these UDFs need to access the key that is common to all 
> records of a group.
> We could add a function to set a the key of a group before the UDF is called 
> (``setGroupKey()``) and a function to get the key (``getGroupKey()``) that 
> can be called from the UDF.
> What do you think about this?
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/553
> Created by: [fhueske|https://github.com/fhueske]
> Labels: enhancement, java api, scala api, user satisfaction, 
> Assignee: [aalexandrov|https://github.com/aalexandrov]
> Created at: Mon Mar 10 22:28:27 CET 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to