Ufuk Celebi created FLINK-2119:
----------------------------------
Summary: Add ExecutionGraph support for leg-wise scheduling
Key: FLINK-2119
URL: https://issues.apache.org/jira/browse/FLINK-2119
Project: Flink
Issue Type: Improvement
Components: Scheduler
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Scheduling currently happens by lazily unrolling the ExecutionGraph from the
sources.
1. All sources are scheduled for execution.
2. Their results trigger scheduling and deployment of the receiving tasks
(either on the first available buffer or when all are produced [pipelined vs.
blocking exchange]).
For certain batch jobs this can be problematic as many tasks will be running at
the same time and consume task manager resources like executionslots and
memory. For these jobs, it is desirable to schedule the ExecutionGraph in with
different strategies.
With respect to the ExecutionGraph, the current limitation is that data
availability for a result always triggers scheduling of the consuming tasks.
This needs to be more general to allow different scheduling strategies.
Consider the following example:
{code}
[ union ]
/ \
/ \
[ source 1 ] [ source 2 ]
{code}
Currently, both sources are scheduled concurrently and the "faster" one
triggers scheduling of the union. It is desirable to first allow source 1 to
completly produce its result, then trigger scheduling of source 2, and only
then schedule the union.
The required changes in the ExecutionGraph are conceptually straight-forward:
instead of going through the list of result consumers and scheduling them, we
need to be able to run a more general action. For normal operation, this will
still schedule the consumer task, but we can also configure it to kick of the
next source etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)