Gopal V created TEZ-2103: ---------------------------- Summary: Implement a Partial completion VertexManagerPlugin Key: TEZ-2103 URL: https://issues.apache.org/jira/browse/TEZ-2103 Project: Apache Tez Issue Type: Improvement Reporter: Gopal V
Currently, there is no sibling communication between tasks - this implies that a task can be completed by the first vertex in a wave of tasks, but the entire wave of tasks has to complete before success can be reported. This occurs in limit + filter query patterns common between the data access engines. {code} select * from data where x > 1 limit 10; {code} will run through a full-table scan worth of tasks to generate 10 rows per task, to aggregate it to produce the final 10 row result. The VertexManager receives counters/events early enough to short-circuit the rest of the vertex tasks, to prevent the remainder of tasks from getting scheduled when the limit condition has been satisfied by an initial sub-set of the tasks. This is a specialization of the VertexManagerPlugin for this common case scheduling pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)