Thanks Stephan,

So I took a quick look at the ChannelSelectors the batch api uses and I see 
that for Forward strategy uses round-robin. My question was aimed exactly to 
avoid having to do this. Isn’t this sub-optimal? 
Maybe we could pass the channel info to the channel selector, so it can make 
“smarter” decision.

Gyula

> On 27 Nov 2014, at 10:45, Stephan Ewen <se...@apache.org> wrote:
> 
> This is a bit tricky, since the new scheduling is more flexible...
> 
> Assume we have a PointWise connection with two receiving tasks per sending
> task: outgoing channels 0 and 1.
> 
> When scheduling in the most basic mode, the receivers can go anywhere, but
> the schedule will try to give them a slot on the same instance, if
> possible. Both could be in-memory, but both could be remote.
> 
> When using a SlotSharingGroup (we do that by default right now), one of the
> receivers can share the slot of the sender, but not both. Which one does
> depends wich one stays first. Currently that is the one which gets data
> first, but this is going to change soon with the new channels and
> deployment.
> 
> When you use a CoLocationGroup, you are guaranteed that subtasks n of the
> sender is Co-located with subtask n of the receiver. But in the above
> PointWise model, you would rather want subtask n to be co-located with
> subtask 2*n.
> 
> I don't think there is a reliable way to guarantee that, other than having
> a slightly modified version of the co-location constraint.
> 
> Stephan
> Am 27.11.2014 00:45 schrieb "Gyula Fora" <gyula.f...@gmail.com>:
> 
>> Hey,
>> 
>> I was hoping that someone can answer this right away without me having to
>> dig through all the code :)
>> 
>> How does the channel indexing go when more then one consumer subtask is
>> connected to an intermediate dataset in the pointwise pattern?
>> I am trying to figure out which one is the “in-memory” channel to set up
>> proper partitioning for streams in this case. The idea would be to push the
>> majority to the in memory channel while still push some messages to the
>> network channel to leverage operator parallelism, but to implement this I
>> need to figure out the index of the in-memory channel.
>> 
>> Thank you,
>> Gyula

Reply via email to