[jira] [Created] (FLINK-12786) Implement local aggregation in Flink

vinoyang (JIRA) Sun, 09 Jun 2019 19:20:32 -0700

vinoyang created FLINK-12786:
--------------------------------

             Summary: Implement local aggregation in Flink
                 Key: FLINK-12786
                 URL: https://issues.apache.org/jira/browse/FLINK-12786
             Project: Flink
          Issue Type: New Feature
          Components: API / DataStream
            Reporter: vinoyang
            Assignee: vinoyang

Currently, keyed streams are widely used to perform aggregating operations
(e.g., reduce, sum and window) on the elements that have the same key. When
executed at runtime, the elements with the same key will be sent to and
aggregated by the same task.
The performance of these aggregating operations is very sensitive to the
distribution of keys. In the cases where the distribution of keys follows a
powerful law, the performance will be significantly downgraded. More unluckily,
increasing the degree of parallelism does not help when a task is overloaded by
a single key.
Local aggregation is a widely-adopted method to reduce the performance degraded
by data skew. We can decompose the aggregating operations into two phases. In
the first phase, we aggregate the elements of the same key at the sender side
to obtain partial results. Then at the second phase, these partial results are
sent to receivers according to their keys and are combined to obtain the final
result. Since the number of partial results received by each receiver is
limited by the number of senders, the imbalance among receivers can be reduced.
Besides, by reducing the amount of transferred data the performance can be
further improved.

The design documentation is here:
[https://docs.google.com/document/d/1gizbbFPVtkPZPRS8AIuH8596BmgkfEa7NRwR6n3pQes/edit?usp=sharing]

The discussion thread is here:
[http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3CCAA_=o7dvtv8zjcxknxyoyy7y_ktvgexrvb4zhxjwzuhsulz...@mail.gmail.com%3E]

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-12786) Implement local aggregation in Flink

Reply via email to