[jira] [Commented] (FLINK-2237) Add hash-based Aggregation

Gabor Gevay (JIRA) Sat, 26 Sep 2015 02:29:15 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14909174#comment-14909174
 ]


Gabor Gevay commented on FLINK-2237:
------------------------------------

Couldn't the CompactingHashTable be reused here? The driver would get a prober 
from it, and then for each incoming record, the driver would call getMatchFor, 
then do one step of the reduce, and then write the result back with updateMatch.

I'm interested in this feature because I have a computation where it would 
really make a huge difference: I'm calling flatMap on an already large data 
set, which blows it up to about 10 times the size, and then groupBy and reduce. 
If Flink had hash-based aggregation, then the result of the flatMap wouldn't 
need to be materialized, which would reduce the memory requirement to about 
1/10.

> Add hash-based Aggregation
> --------------------------
>
>                 Key: FLINK-2237
>                 URL: https://issues.apache.org/jira/browse/FLINK-2237
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Rafiullah Momand
>            Priority: Minor
>
> Aggregation functions at the moment are implemented in a sort-based way.
> How can we implement hash based Aggregation for Flink?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2237) Add hash-based Aggregation

Reply via email to