Yi Zhang created TEZ-4513:
-----------------------------
Summary: Add feature to fail DAG when too many re-runs
Key: TEZ-4513
URL: https://issues.apache.org/jira/browse/TEZ-4513
Project: Apache Tez
Issue Type: Improvement
Affects Versions: 0.10.2
Reporter: Yi Zhang
Sometimes when nodes failure happen, shuffle data are lost and producer tasks
are re-run, those tasks' ancestor in turn may need to re-run, but cluster may
not have enough resources to re-run those tasks fast. In this scenario, it may
be desirable to fail the DAG.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)