Our implication so far was that forwarding means evenly scattering over successors - a balanced load being the important goal.
If you find different requirements in streaming, you could define a new type of selector. On Thu, Nov 27, 2014 at 11:02 AM, Gyula Fora <gyula.f...@gmail.com> wrote: > Thanks Stephan, > > So I took a quick look at the ChannelSelectors the batch api uses and I > see that for Forward strategy uses round-robin. My question was aimed > exactly to avoid having to do this. Isn’t this sub-optimal? > Maybe we could pass the channel info to the channel selector, so it can > make “smarter” decision. > > Gyula > > > On 27 Nov 2014, at 10:45, Stephan Ewen <se...@apache.org> wrote: > > > > This is a bit tricky, since the new scheduling is more flexible... > > > > Assume we have a PointWise connection with two receiving tasks per > sending > > task: outgoing channels 0 and 1. > > > > When scheduling in the most basic mode, the receivers can go anywhere, > but > > the schedule will try to give them a slot on the same instance, if > > possible. Both could be in-memory, but both could be remote. > > > > When using a SlotSharingGroup (we do that by default right now), one of > the > > receivers can share the slot of the sender, but not both. Which one does > > depends wich one stays first. Currently that is the one which gets data > > first, but this is going to change soon with the new channels and > > deployment. > > > > When you use a CoLocationGroup, you are guaranteed that subtasks n of the > > sender is Co-located with subtask n of the receiver. But in the above > > PointWise model, you would rather want subtask n to be co-located with > > subtask 2*n. > > > > I don't think there is a reliable way to guarantee that, other than > having > > a slightly modified version of the co-location constraint. > > > > Stephan > > Am 27.11.2014 00:45 schrieb "Gyula Fora" <gyula.f...@gmail.com>: > > > >> Hey, > >> > >> I was hoping that someone can answer this right away without me having > to > >> dig through all the code :) > >> > >> How does the channel indexing go when more then one consumer subtask is > >> connected to an intermediate dataset in the pointwise pattern? > >> I am trying to figure out which one is the “in-memory” channel to set up > >> proper partitioning for streams in this case. The idea would be to push > the > >> majority to the in memory channel while still push some messages to the > >> network channel to leverage operator parallelism, but to implement this > I > >> need to figure out the index of the in-memory channel. > >> > >> Thank you, > >> Gyula > >