[ https://issues.apache.org/jira/browse/FLINK-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gabor Gevay updated FLINK-3479: ------------------------------- Priority: Minor (was: Major) > Add hash-based strategy for CombineFunction > ------------------------------------------- > > Key: FLINK-3479 > URL: https://issues.apache.org/jira/browse/FLINK-3479 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime > Reporter: Fabian Hueske > Priority: Minor > > This issue is similar to FLINK-3477 but adds a hash-based strategy for > {{CombineFunction}} instead of {{ReduceFunction}}. > The interface of {{CombineFunction}} differs from {{ReduceFunction}} by > providing an {{Iterable<T>}} instead of two {{T}} values. Hence, if the > {{Iterable<T>}} provides two values, we can do the same as with a > {{ReduceFunction}}. > At the moment, {{CombineFunction}} is wrapped in a {{GroupCombineFunction}} > and hence executed using the {{GroupReduceCombineDriver}}. > We should add dedicated two dedicated drivers: {{CombineDriver}} and > {{ChainedCombineDriver}} and two driver strategies: {{HASH_COMBINE}} and > {{SORT_COMBINE}}. > If FLINK-3477 is resolved, we can reuse the hash-table. > We should also add compiler hints to `DataSet.reduceGroup()` and > `Grouping.reduceGroup()` to allow users to select between a {{SORT}} and > {{HASH}} based combine strategies ({{HASH}} will only be applicable to > {{CombineFunction}} and not {{GroupCombineFunction}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332)