[ https://issues.apache.org/jira/browse/TEZ-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan reassigned TEZ-4183: ------------------------------------- Assignee: Panagiotis Garefalakis > Time- and threshold-batched FetchFailure event propagation to AM > ---------------------------------------------------------------- > > Key: TEZ-4183 > URL: https://issues.apache.org/jira/browse/TEZ-4183 > Project: Apache Tez > Issue Type: Improvement > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Attachments: TEZ-4183.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Fetcher currently sends failure events to AM as soon as they are discovered: > https://github.com/apache/tez/blob/354c2a4177fe8c3cf6b8a4c6009d4068a19d81f1/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/shuffle/impl/ShuffleManager.java#L930 > To reduce AM pressure we can: 1) Batch fetch failure events to be sent > periodically (every BATCH_WAIT) and 2) if we see disk errors more than a > Threshold send the message immediately to AM (instead of waiting) -- This message was sent by Atlassian Jira (v8.3.4#803005)