[ 
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387888#comment-14387888
 ] 

Jeff Zhang commented on TEZ-1909:
---------------------------------

[~hitesh] Upload new patch, please help review it.

* Regarding the corrupted summary record, add the following 2 check. Because 
SummaryEventProto.parseDelimitedFrom(summaryStream) only read the size of the 
protobuf, may throw exception when parsing the fields. There's one unit test 
for this in TestRecoveryParser.
{code}
        TezDAGID dagId = TezDAGID.fromString(proto.getDagId());
        if (dagId == null) {
          throw new IOException("null dagId, summary records may be corrupted");
        }
{code}
{code}
        try {
          dagSummaryDataMap.get(dagId).handleSummaryEvent(proto);
        } catch (Exception e) {
          // any exception when parsing protobuf
          throw new IOException("Error when parsing summary event proto", e);
        }

{code}

> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
>                 Key: TEZ-1909
>                 URL: https://issues.apache.org/jira/browse/TEZ-1909
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch, 
> TEZ-1909-4.patch
>
>
> Use of file versions should prevent the need for copying over data into a 
> second attempt dir. Care needs to be taken to handle "last corrupt record" 
> handling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to