Thanks Stephan, So I took a quick look at the ChannelSelectors the batch api uses and I see that for Forward strategy uses round-robin. My question was aimed exactly to avoid having to do this. Isn’t this sub-optimal? Maybe we could pass the channel info to the channel selector, so it can make “smarter” decision.
Gyula > On 27 Nov 2014, at 10:45, Stephan Ewen <se...@apache.org> wrote: > > This is a bit tricky, since the new scheduling is more flexible... > > Assume we have a PointWise connection with two receiving tasks per sending > task: outgoing channels 0 and 1. > > When scheduling in the most basic mode, the receivers can go anywhere, but > the schedule will try to give them a slot on the same instance, if > possible. Both could be in-memory, but both could be remote. > > When using a SlotSharingGroup (we do that by default right now), one of the > receivers can share the slot of the sender, but not both. Which one does > depends wich one stays first. Currently that is the one which gets data > first, but this is going to change soon with the new channels and > deployment. > > When you use a CoLocationGroup, you are guaranteed that subtasks n of the > sender is Co-located with subtask n of the receiver. But in the above > PointWise model, you would rather want subtask n to be co-located with > subtask 2*n. > > I don't think there is a reliable way to guarantee that, other than having > a slightly modified version of the co-location constraint. > > Stephan > Am 27.11.2014 00:45 schrieb "Gyula Fora" <gyula.f...@gmail.com>: > >> Hey, >> >> I was hoping that someone can answer this right away without me having to >> dig through all the code :) >> >> How does the channel indexing go when more then one consumer subtask is >> connected to an intermediate dataset in the pointwise pattern? >> I am trying to figure out which one is the “in-memory” channel to set up >> proper partitioning for streams in this case. The idea would be to push the >> majority to the in memory channel while still push some messages to the >> network channel to leverage operator parallelism, but to implement this I >> need to figure out the index of the in-memory channel. >> >> Thank you, >> Gyula