[ https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392898#comment-14392898 ]
Hitesh Shah commented on TEZ-2192: ---------------------------------- [~zjffdu] The sample job was just mainly for manual testing. I logged the file names and the file size ( not the md5 ) just to eyeball the info. In any case, it was not something that I planned to commit to the codebase. Do either you or [~sseth] see a need for it to be part of the tez-tests ( as compared to just having the unit test )? > Relocalization does not check for source > ---------------------------------------- > > Key: TEZ-2192 > URL: https://issues.apache.org/jira/browse/TEZ-2192 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.6.0, 0.5.2 > Reporter: Rohini Palaniswamy > Assignee: Hitesh Shah > Priority: Blocker > Attachments: TEZ-2192.1.patch, TEZ-2192.2.patch, TEZ-2192.3.patch, > test-job-2192.patch > > > PIG-4443 spills the input splits to disk if serialized split size is greater > than some threshold. It faces issues with relocalization when more than one > vertex has job.split file. If a job.split file is already there on container > reuse, it is reused causing wrong data to be read. > Either need a way to turn off relocalization or check the source+timestamp > and redownload the file during relocalization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)