[jira] [Commented] (FLINK-8532) RebalancePartitioner should use Random value for its first partition

ASF GitHub Bot (JIRA) Sun, 26 Aug 2018 07:35:16 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592910#comment-16592910
 ]


ASF GitHub Bot commented on FLINK-8532:
---------------------------------------

StephanEwen commented on issue #6544: [FLINK-8532] [Streaming] modify 
RebalancePartitioner to use a random partition as its first partition
URL: https://github.com/apache/flink/pull/6544#issuecomment-416043267
 
 
   Thanks for taking a deeper look. Unfortunately, divisions (modulo) are even 
more expensive, so would be good to avoid them.
   
   I think the solution can be actually a bit simpler. It would probably be 
sufficient to simply initialize the array to `INT_MAX - 1` replace the 
`this.returnArray[0] = 0;` in the original code with `this.returnArray[0] = 
resetValue()`. Inside the `resetValue()` you can do the initialization.
   
   That way, common cases have no additional check, and the overflow/reset case 
gets one additional branch, which is already a good improvement.
   
   We could possibly do a followup optimization, where outputs that have only 
one channel swap in a special selector that always returns just `0`. The 
one-channel-only case is probably the one that would be affected most by this 
change, because it always overflows each time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> RebalancePartitioner should use Random value for its first partition
> --------------------------------------------------------------------
>
>                 Key: FLINK-8532
>                 URL: https://issues.apache.org/jira/browse/FLINK-8532
>             Project: Flink
>          Issue Type: Improvement
>          Components: DataStream API
>            Reporter: Yuta Morisawa
>            Assignee: Guibo Pan
>            Priority: Major
>              Labels: pull-request-available
>
> In some conditions, RebalancePartitioner doesn't balance data correctly 
> because it use the same value for selecting next operators.
> RebalancePartitioner initializes its partition id using the same value in 
> every threads, so it indeed balances data, but at one moment the amount of 
> data in each operator is skew.
> Particularly, when the data rate of  former operators is equal , data skew 
> becomes severe.
>  
>  
> Example:
> Consider a simple operator chain.
> -> map1 -> rebalance -> map2 ->
> Each map operator(map1, map2) contains three subtasks(subtask 1, 2, 3, 4, 5, 
> 6).
> map1          map2
>  st1              st4
>  st2              st5
>  st3              st6
>  
> At the beginning, every subtasks in map1 sends data to st4 in map2 because 
> they use the same initial parition id.
> Next time the map1 receive data st1,2,3 send data to st5 because they 
> increment its partition id when they processed former data.
> In my environment,  it takes twice the time to process data when I use 
> RebalancePartitioner  as long as I use other partitioners(rescale, keyby).
>  
> To solve this problem, in my opinion, RebalancePartitioner should use its own 
> operator id for the initial value.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8532) RebalancePartitioner should use Random value for its first partition

Reply via email to