Bikas Saha created TEZ-2431: ------------------------------- Summary: Recovery of task events (eg. datamovement events) should not depend on ordering of task attempt events Key: TEZ-2431 URL: https://issues.apache.org/jira/browse/TEZ-2431 Project: Apache Tez Issue Type: Sub-task Reporter: Bikas Saha
Today, task attempt events need to go through verteximpl before reaching the task in order to maintain ordering guarantees for recovery. This causes these events to be routed twice through the dispatcher. This can cause overhead delays in large jobs. Also, this makes assumptions about event ordering which make the system fragile. Recovery should work independently of other system interactions so that evolution of other components is not affected by recovery unless it affects recovery logically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)