[ https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442949#comment-16442949 ]
Jason Lowe commented on TEZ-3914: --------------------------------- Thanks for updating the patch and providing an overview of the approach! Rather than wrapping debug calls with isDebugEnabled, we should leverage slf4j's positional parameter syntax to make the debug call cheap enough to just call unconditionally, e.g.: {code} LOG.debug("Read HistoryEvent, eventType={}, event={}", historyEvent.getEventType(), historyEvent); {code} RecoveryStream should be marked VisibleForTesting. I think a close() method on RecoveryStream could clean up the following code and make it more maintainable for future usages: {code} entry.getValue().codedOutputStream.flush(); entry.getValue().outputStream.hflush(); entry.getValue().outputStream.close(); {code} Speaking of a close() method for RecoveryStream to help correctness when the stream needs to be shut down, aren't we missing a codedOutputStream.flush() call here? {code} if (outputStreamMap.containsKey(dagId)) { try { outputStreamMap.get(dagId).outputStream.close(); {code} Similar to the RecoveryStream close method, I could see moving doFlush to a flush method for RecoveryStream Suggestion: RecoveryStream should just take an output stream and create the coded output stream in the constructor rather than requiring the caller to create it and pass it. The caller doesn't care about holding onto that coded stream in practice. > Recovering a large DAG fails to size limit exceeded > --------------------------------------------------- > > Key: TEZ-3914 > URL: https://issues.apache.org/jira/browse/TEZ-3914 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Priority: Major > Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch, > TEZ-3914.003.patch > > > A large message will be failed to parse and will be treated as recovery file > EOF. > {noformat} > 2018-04-16 15:33:59,807 WARN [Thread-2] app.RecoveryParser > (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying > to read next event > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)