Thanks! I found this PR already but seemed to be completely outdated :) Maybe it's worth restarting this discussion.
Gyula Chesnay Schepler <ches...@apache.org> ezt írta (időpont: 2016. jún. 13., H, 14:58): > FLINK-1003 may be related. > > On 13.06.2016 12:46, Gyula Fóra wrote: > > Hey, > > > > The Flink scheduling mechanism has become quite a bit of a pain lately > for > > us when trying to schedule IO heavy streaming jobs. And by IO heavy I > mean > > it has a fairly large state that is being continuously updated/read. > > > > The main problem is that the scheduled task slots are not evenly > > distributed among the different task managers but usually the first TM > > takes as much slots as possibles and the other TMs get much fewer. And > > since the job is RocksDB IO bound the uneven load causes a significant > > performance penalty. > > > > This is further accentuated during historical runs when we are trying to > > "fast-forward" the application. The difference can be quite substantial > in > > a 3-4 node cluster: with even task distribution the history might run 3 > > times faster compared to an uneven one. > > > > I was wondering if there was a simple way to modify the scheduler so it > > allocates resources in a round-robin fashion. Probably someone has a lot > of > > experience with this already :) (I'm running 1.0.3 for this job btw) > > > > Cheers, > > Gyula > > > >