[ 
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805234#comment-16805234
 ] 

Jozef Vilcek commented on BEAM-5865:
------------------------------------

Hello, I wanted to post update earlier but got swamped with other things. I did 
put together a git commit to illustrate changes required for the feature of 
auto-balancing keys from write files to Flink workers. This should guarantee an 
even spread of keys among workers. I did not create a PR because this is 
nowhere near finish line but really just an illustration of the landscape which 
will need to be hit somehow.

[https://github.com/JozoVilcek/beam/commit/afc7fe949b543604cead529171774153b6caa433]

My main questions and concern is about changes required at SDK level around 
WriteFiles and ShardedKey. An am not sure if this is possible to do in a 
backward compatible manner. I would prefer Flink to replace ShardedKey with 
it's own alternative, but I am not sure what does it mean at the level of 
operators and coders (not just swapping logic inside DoFn).

What I would like to get from this is:

* how does such change feels conceptually, does it still make sense and we can 
continue?
* how should we incorporate it into the SDK and FlinkRunner

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to