Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

vino yang Thu, 13 Jun 2019 20:35:29 -0700

Hi Aljoscha,

I have looked at the "*Process*" section of FLIP wiki page.[1] This mail
thread indicates that it has proceeded to the third step.


When I looked at the fourth step(vote step), I didn't find the
prerequisites for starting the voting process.

Considering that the discussion of this feature has been done in the old
thread. [2] So can you tell me when should I start voting? Can I start now?

Best,
Vino

[1]:
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-FLIPround-up
[2]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Local-Aggregation-in-Flink-td29307.html#a29308

leesf <leesf0...@gmail.com> 于2019年6月13日周四 上午9:19写道：

> +1 for the FLIP, thank vino for your efforts.
>
> Best,
> Leesf
>
> vino yang <yanghua1...@gmail.com> 于2019年6月12日周三 下午5:46写道：
>
> > Hi folks,
> >
> > I would like to start the FLIP discussion thread about supporting local
> > aggregation in Flink.
> >
> > In short, this feature can effectively alleviate data skew. This is the
> > FLIP:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-44%3A+Support+Local+Aggregation+in+Flink
> >
> >
> > *Motivation* (copied from FLIP)
> >
> > Currently, keyed streams are widely used to perform aggregating
> operations
> > (e.g., reduce, sum and window) on the elements that have the same key.
> When
> > executed at runtime, the elements with the same key will be sent to and
> > aggregated by the same task.
> >
> > The performance of these aggregating operations is very sensitive to the
> > distribution of keys. In the cases where the distribution of keys
> follows a
> > powerful law, the performance will be significantly downgraded. More
> > unluckily, increasing the degree of parallelism does not help when a task
> > is overloaded by a single key.
> >
> > Local aggregation is a widely-adopted method to reduce the performance
> > degraded by data skew. We can decompose the aggregating operations into
> two
> > phases. In the first phase, we aggregate the elements of the same key at
> > the sender side to obtain partial results. Then at the second phase,
> these
> > partial results are sent to receivers according to their keys and are
> > combined to obtain the final result. Since the number of partial results
> > received by each receiver is limited by the number of senders, the
> > imbalance among receivers can be reduced. Besides, by reducing the amount
> > of transferred data the performance can be further improved.
> >
> > *More details*:
> >
> > Design documentation:
> >
> >
> https://docs.google.com/document/d/1gizbbFPVtkPZPRS8AIuH8596BmgkfEa7NRwR6n3pQes/edit?usp=sharing
> >
> > Old discussion thread:
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Local-Aggregation-in-Flink-td29307.html#a29308
> >
> > JIRA: FLINK-12786 <https://issues.apache.org/jira/browse/FLINK-12786>
> >
> > We are looking forwards to your feedback!
> >
> > Best,
> > Vino
> >
>

Re: [DISCUSS] FLIP-44: Support Local Aggregation in Flink

Reply via email to