[ https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387888#comment-14387888 ]
Jeff Zhang commented on TEZ-1909: --------------------------------- [~hitesh] Upload new patch, please help review it. * Regarding the corrupted summary record, add the following 2 check. Because SummaryEventProto.parseDelimitedFrom(summaryStream) only read the size of the protobuf, may throw exception when parsing the fields. There's one unit test for this in TestRecoveryParser. {code} TezDAGID dagId = TezDAGID.fromString(proto.getDagId()); if (dagId == null) { throw new IOException("null dagId, summary records may be corrupted"); } {code} {code} try { dagSummaryDataMap.get(dagId).handleSummaryEvent(proto); } catch (Exception e) { // any exception when parsing protobuf throw new IOException("Error when parsing summary event proto", e); } {code} > Remove need to copy over all events from attempt 1 to attempt 2 dir > ------------------------------------------------------------------- > > Key: TEZ-1909 > URL: https://issues.apache.org/jira/browse/TEZ-1909 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Hitesh Shah > Assignee: Jeff Zhang > Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch, > TEZ-1909-4.patch > > > Use of file versions should prevent the need for copying over data into a > second attempt dir. Care needs to be taken to handle "last corrupt record" > handling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)