[ 
https://issues.apache.org/jira/browse/TEZ-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356259#comment-14356259
 ] 

Rohini Palaniswamy commented on TEZ-2192:
-----------------------------------------

Thanks [~hitesh] for increasing the priority. [~sseth] suggested a hacky 
workaround which of course he did not like recommending. But for the short term 
going with that workaround in Pig to unblock reading from big tables with 
HCatLoader as there is no other alternative without having this fixed in Tez. 
Hack is to create a job.split file for all vertices if we create for one so 
that there is a conflict initially itself and containers are not reused across 
vertices. 

> Relocalization does not check for source
> ----------------------------------------
>
>                 Key: TEZ-2192
>                 URL: https://issues.apache.org/jira/browse/TEZ-2192
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0, 0.5.2
>            Reporter: Rohini Palaniswamy
>            Priority: Blocker
>
>  PIG-4443 spills the input splits to disk if serialized split size is greater 
> than some threshold. It faces issues with relocalization when more than one 
> vertex has job.split file. If a job.split file is already there on container 
> reuse, it is reused causing wrong data to be read.
> Either need a way to turn off relocalization or  check the source+timestamp 
> and redownload the file during relocalization. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to