On 22 Jan 2015, at 11:37, Till Rohrmann <trohrm...@apache.org> wrote:
> I'm not sure whether it is currently possible to schedule first the > receiver and then the sender. Recently, I had to fix the > TaskManagerTest.testRunWithForwardChannel test case where this was exactly > the case. Due to first scheduling the receiver, it happened sometimes that > an IllegalQueueIteratorRequestException in the method > IntermediateResultPartitionManager.getIntermediateResultPartitionIterator > was thrown. The partition manager complained that the producer execution ID > was unknown. I assume that this has to be fixed first in order to schedule > all task immediately. But Ufuk will probably know it better. On 21 Jan 2015, at 20:58, Stephan Ewen <se...@apache.org> wrote: > - The queues would still send notifications to the JobManager that data is > available, but the JM will see that the target task is already deployed (or > currently being deployed). Then the info where to grab a channel from would > need to be sent to the task. That mechanism also exists already. The only minor thing that needs to be adjusted would be this mechanism. It is indeed in place already (e.g. UNKNOWN input channels are updated at runtime to LOCAL or REMOTE input channels depending on the producer location), but currently the consumer tasks assume that the consumed intermediate result partition has already been created when they (the consumer task) are deployed and request the partition. When we schedule all tasks at once, we might end up in situations like the test case Till described, where we know that it is a LOCAL or REMOTE channel, but the intermediate result has not been created yet and the request fails. tl;dr: channels can be updated at runtime, but requests need to arrive after the producer created the partition.