Gopal V created TEZ-2103:
----------------------------

             Summary: Implement a Partial completion VertexManagerPlugin
                 Key: TEZ-2103
                 URL: https://issues.apache.org/jira/browse/TEZ-2103
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Gopal V


Currently, there is no sibling communication between tasks - this implies that 
a task can be completed by the first vertex in a wave of tasks, but the entire 
wave of tasks has to complete before success can be reported.

This occurs in limit + filter query patterns common between the data access 
engines.

{code}
select * from data where x > 1 limit 10;
{code}

will run through a full-table scan worth of tasks to generate 10 rows per task, 
to aggregate it to produce the final 10 row result.

The VertexManager receives counters/events early enough to short-circuit the 
rest of the vertex tasks, to prevent the remainder of tasks from getting 
scheduled when the limit condition has been satisfied by an initial sub-set of 
the tasks.

This is a specialization of the VertexManagerPlugin for this common case 
scheduling pattern.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to