A proposal about skew data handling in Flink

Li, Chengxiang Thu, 15 Oct 2015 03:27:14 -0700

Dear all,
In many real world use case, data are nature to be skewed. For example, in 
social network, famous people get much more "follow" than others, a hot tweet 
would be transferred millions of times. and the purchased records of normal 
product can never compared to hot products. While at the same time, Flink 
runtime assume that all tasks consume same size resources, this's not always 
true. Skew data handling try to make skewed data fit into Flink's runtime.
I write a proposal about skew data handling in Flink, you can read it at 
https://docs.google.com/document/d/1ma060BUlhXDqeFmviEO7Io4CXLKgrAXIfeDYldvZsKI/edit?usp=sharing.
Any comments and feedback are welcome, you can comment on the google doc, or 
reply this email thread directly.


Thanks
Chengxiang

A proposal about skew data handling in Flink

Reply via email to