[ https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Eagles updated TEZ-3914: --------------------------------- Description: A large message will be failed to parse and will be treated as recovery file EOF. {noformat} 2018-04-16 15:33:59,807 WARN [Thread-2] app.RecoveryParser (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying to read next event com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. {noformat} was:Any failure to parse recovery event is ignore and treated as eof. Job can hang since some task completions may be missed and shuffle will hang. > Recovering a large DAG fails to due > ----------------------------------- > > Key: TEZ-3914 > URL: https://issues.apache.org/jira/browse/TEZ-3914 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Priority: Major > Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch, > TEZ-3914.003.patch > > > A large message will be failed to parse and will be treated as recovery file > EOF. > {noformat} > 2018-04-16 15:33:59,807 WARN [Thread-2] app.RecoveryParser > (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying > to read next event > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)