[ https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390296#comment-14390296 ]
Jeff Zhang commented on TEZ-2192: --------------------------------- * Is the firstContainerSignature still necessary ? It looks like it's only used in method isNew(), can lastTaskInfo or lastAssignedContainerSignature archive the same result ? * May need to add one more debug logging in ContainerContext that local resource is incompatible due to ARCHIVE & PATTERN {code} if (EnumSet.of(LocalResourceType.ARCHIVE, LocalResourceType.PATTERN).contains(lr.getType())) { return false; } {code} > Relocalization does not check for source > ---------------------------------------- > > Key: TEZ-2192 > URL: https://issues.apache.org/jira/browse/TEZ-2192 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0, 0.5.2 > Reporter: Rohini Palaniswamy > Assignee: Hitesh Shah > Priority: Blocker > Attachments: TEZ-2192.1.patch > > > PIG-4443 spills the input splits to disk if serialized split size is greater > than some threshold. It faces issues with relocalization when more than one > vertex has job.split file. If a job.split file is already there on container > reuse, it is reused causing wrong data to be read. > Either need a way to turn off relocalization or check the source+timestamp > and redownload the file during relocalization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)