This is a bit tricky, since the new scheduling is more flexible...

Assume we have a PointWise connection with two receiving tasks per sending
task: outgoing channels 0 and 1.

When scheduling in the most basic mode, the receivers can go anywhere, but
the schedule will try to give them a slot on the same instance, if
possible. Both could be in-memory, but both could be remote.

When using a SlotSharingGroup (we do that by default right now), one of the
receivers can share the slot of the sender, but not both. Which one does
depends wich one stays first. Currently that is the one which gets data
first, but this is going to change soon with the new channels and
deployment.

When you use a CoLocationGroup, you are guaranteed that subtasks n of the
sender is Co-located with subtask n of the receiver. But in the above
PointWise model, you would rather want subtask n to be co-located with
subtask 2*n.

I don't think there is a reliable way to guarantee that, other than having
a slightly modified version of the co-location constraint.

Stephan
Am 27.11.2014 00:45 schrieb "Gyula Fora" <gyula.f...@gmail.com>:

> Hey,
>
> I was hoping that someone can answer this right away without me having to
> dig through all the code :)
>
> How does the channel indexing go when more then one consumer subtask is
> connected to an intermediate dataset in the pointwise pattern?
> I am trying to figure out which one is the “in-memory” channel to set up
> proper partitioning for streams in this case. The idea would be to push the
> majority to the in memory channel while still push some messages to the
> network channel to leverage operator parallelism, but to implement this I
> need to figure out the index of the in-memory channel.
>
> Thank you,
> Gyula

Reply via email to