[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860119#action_12860119
 ] 

Hemanth Yamijala commented on MAPREDUCE-1288:
---------------------------------------------

bq. What happens in the case that the archive file changes in flight. For 
example, I submit a job using that archive. While my job is running, I notice a 
bug, remove the old cache file, push a new one to hdfs, and then launch a new 
invocation of my job. Would the new job get the old cache file because the old 
job is still running? 

Allen, the key that identifies a cache file on a tasktracker node is a 
combination of the URL and the DFS timestamp that is determined when the job is 
submitted. Hence, the new job would get a new key and hence be localized 
afresh. This is irrespective of whether the old file was ever localized on the 
same node or not. I am assuming here that a file upload to DFS to the same URL 
would modify the timestamp.

Further, when this happens, new tasks of the old job that are running on nodes 
where the localization of the invalid file has already happened, will fail 
because the localization process for the new tasks will detect the file has 
changed in-flight.

Hope this is correct.

> DistributedCache localizes only once per cache URI
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1288
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> As part of the file localization the distributed cache localizer creates a 
> copy of the file in the corresponding user's private directory. The 
> localization in DistributedCache assumes the key as the URI of the cachefile 
> and if it already exists in the map, the localization is not done again. This 
> means that another user cannot access the same distributed cache file. We 
> should change the key to include the username so that localization is done 
> for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to