Re: Channel indexing with pointwise connection pattern

Stephan Ewen Thu, 27 Nov 2014 02:08:43 -0800

Our implication so far was that forwarding means evenly scattering over
successors - a balanced load being the important goal.


If you find different requirements in streaming, you could define a new
type of selector.

On Thu, Nov 27, 2014 at 11:02 AM, Gyula Fora <gyula.f...@gmail.com> wrote:

> Thanks Stephan,
>
> So I took a quick look at the ChannelSelectors the batch api uses and I
> see that for Forward strategy uses round-robin. My question was aimed
> exactly to avoid having to do this. Isn’t this sub-optimal?
> Maybe we could pass the channel info to the channel selector, so it can
> make “smarter” decision.
>
> Gyula
>
> > On 27 Nov 2014, at 10:45, Stephan Ewen <se...@apache.org> wrote:
> >
> > This is a bit tricky, since the new scheduling is more flexible...
> >
> > Assume we have a PointWise connection with two receiving tasks per
> sending
> > task: outgoing channels 0 and 1.
> >
> > When scheduling in the most basic mode, the receivers can go anywhere,
> but
> > the schedule will try to give them a slot on the same instance, if
> > possible. Both could be in-memory, but both could be remote.
> >
> > When using a SlotSharingGroup (we do that by default right now), one of
> the
> > receivers can share the slot of the sender, but not both. Which one does
> > depends wich one stays first. Currently that is the one which gets data
> > first, but this is going to change soon with the new channels and
> > deployment.
> >
> > When you use a CoLocationGroup, you are guaranteed that subtasks n of the
> > sender is Co-located with subtask n of the receiver. But in the above
> > PointWise model, you would rather want subtask n to be co-located with
> > subtask 2*n.
> >
> > I don't think there is a reliable way to guarantee that, other than
> having
> > a slightly modified version of the co-location constraint.
> >
> > Stephan
> > Am 27.11.2014 00:45 schrieb "Gyula Fora" <gyula.f...@gmail.com>:
> >
> >> Hey,
> >>
> >> I was hoping that someone can answer this right away without me having
> to
> >> dig through all the code :)
> >>
> >> How does the channel indexing go when more then one consumer subtask is
> >> connected to an intermediate dataset in the pointwise pattern?
> >> I am trying to figure out which one is the “in-memory” channel to set up
> >> proper partitioning for streams in this case. The idea would be to push
> the
> >> majority to the in memory channel while still push some messages to the
> >> network channel to leverage operator parallelism, but to implement this
> I
> >> need to figure out the index of the in-memory channel.
> >>
> >> Thank you,
> >> Gyula
>
>

Re: Channel indexing with pointwise connection pattern

Reply via email to