[ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442949#comment-16442949
 ] 

Jason Lowe commented on TEZ-3914:
---------------------------------

Thanks for updating the patch and providing an overview of the approach!

Rather than wrapping debug calls with isDebugEnabled, we should leverage 
slf4j's positional parameter syntax to make the debug call cheap enough to just 
call unconditionally, e.g.:
{code}
    LOG.debug("Read HistoryEvent, eventType={}, event={}", 
historyEvent.getEventType(), historyEvent);
{code}

RecoveryStream should be marked VisibleForTesting.

I think a close() method on RecoveryStream could clean up the following code 
and make it more maintainable for future usages:
{code}
          entry.getValue().codedOutputStream.flush();
          entry.getValue().outputStream.hflush();
          entry.getValue().outputStream.close();
{code}

Speaking of a close() method for RecoveryStream to help correctness when the 
stream needs to be shut down, aren't we missing a codedOutputStream.flush() 
call here?
{code}
            if (outputStreamMap.containsKey(dagId)) {
              try {
                outputStreamMap.get(dagId).outputStream.close();
{code}

Similar to the RecoveryStream close method, I could see moving doFlush to a 
flush method for RecoveryStream

Suggestion: RecoveryStream should just take an output stream and create the 
coded output stream in the constructor rather than requiring the caller to 
create it and pass it.  The caller doesn't care about holding onto that coded 
stream in practice.


> Recovering a large DAG fails to size limit exceeded
> ---------------------------------------------------
>
>                 Key: TEZ-3914
>                 URL: https://issues.apache.org/jira/browse/TEZ-3914
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch, 
> TEZ-3914.003.patch
>
>
> A large message will be failed to parse and will be treated as recovery file 
> EOF.
> {noformat}
> 2018-04-16 15:33:59,807 WARN  [Thread-2] app.RecoveryParser 
> (RecoveryParser.java:parseRecoveryData(771)) - Corrupt data found when trying 
> to read next event
> com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
> large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase 
> the size limit.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to