[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926107#action_12926107 ] Hudson commented on MAPREDUCE-1288: --- Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See [https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/]) > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.22.0 > > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch, MR-1288-trunk-1.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894620#action_12894620 ] Owen O'Malley commented on MAPREDUCE-1288: -- Looks good. +1 > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.22.0 > > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch, MR-1288-trunk-1.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892971#action_12892971 ] Owen O'Malley commented on MAPREDUCE-1288: -- {quote} (2) introduce the concept of group sharing of distributed cache files so as to avoid repetitive downloads for group shared files also. This may be a complex solution after all. {quote} This would be quite complex to get right. In particular, it is difficult to determine which group should have access. If we want to improve it, I'd suggest that we use hardlinks to give each user access to a single copy of the file.. Of course you need to ensure that they do in fact have read access to the original file. *smile* > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892948#action_12892948 ] Devaraj Das commented on MAPREDUCE-1288: bq. Devaraj, this corner case is exactly what Hemanth was trying to explain earlier on this ticket, starting with comment #4 above Yeah.. i realized that.. That's the reason i stuck to this jira rather than opening a new one :-) bq. As for the approach, we have two options: (1) (this seems to be what the patch is doing) for group shared files, localize them separately for each user. This is a simple solution, but sacrifices the optimization ( may not be too bad?) Yes, I am going with this for now. If needed (after we deploy this patch on our clusters and observe), we can look at proposal (2) in your comment.. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892916#action_12892916 ] Owen O'Malley commented on MAPREDUCE-1288: -- It looks good. I'd suggest: 1. change DistributedChache.releaseCache to pass in the current user to TrackerDistributedCacheManager.releaseCache rather than creating a new method. 2. it looks like the constructor for CacheFile can easily throw IOException instead of putting it in a RuntimeException. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892724#action_12892724 ] Vinod K V commented on MAPREDUCE-1288: -- Devaraj, this corner case is exactly what Hemanth was trying to explain earlier on this ticket, starting with comment #4 above :) As for the approach, we have two options: (1) (this seems to be what the patch is doing) for group shared files, localize them separately for each user. This is a simple solution, but sacrifices the optimization ( may not be too bad?) (2) introduce the concept of group sharing of distributed cache files so as to avoid repetitive downloads for group shared files also. This may be a complex solution after all. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861369#action_12861369 ] Vinod K V commented on MAPREDUCE-1288: -- The only corner case that isn't handled yet is when a file on DFS is originally a private file, gets localized in private dirs on TT and then subsequently becomes a public file on DFS itself. Though it is still a possibility, it is pretty corner-case. I am +1 for moving this one out of the blocker queue for 0.21. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861363#action_12861363 ] Hemanth Yamijala commented on MAPREDUCE-1288: - The original intent of this JIRA is certainly different from what we ended up discussing in the end. So, I request we move discussion about the in-flight change to MAPREDUCE-1729. And I ask back the original question: With the availability of public and private distributed cache options, can we assume this issue is not a blocker any more ? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861231#action_12861231 ] Amareshwari Sriramadasu commented on MAPREDUCE-1288: bq. I raised MAPREDUCE-1288 for the discussion on "in flight" archive. Sorry, I meant MAPREDUCE-1729. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861230#action_12861230 ] Amareshwari Sriramadasu commented on MAPREDUCE-1288: I raised MAPREDUCE-1288 for the discussion on "in flight" archive. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861226#action_12861226 ] Amareshwari Sriramadasu commented on MAPREDUCE-1288: Allen, I understand your use-case and I agree that it should be tunable from the job. But, I think this is not a blocker anymore, because distributed cache always had this behavior to fail the job if cache file gets modified on the fly. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861028#action_12861028 ] Allen Wittenauer commented on MAPREDUCE-1288: - Why would a task from an already running job not be able to find version-0? Why is the task tracker removing content from the cache of a running job? If the content moved/is different, shouldn't the job tracker be able to reschedule tasks onto a task tracker that has a copy? Why can't the task tracker copy the dcache from another task tracker that does have a copy? That said, I'm not convinced that in the majority of cases that version-0 vs. version-1 is undefined. From what I've seen, most of the time different versions of a dcache are downwardly compatible. As much as folks hate tunables, perhaps that is the answer here: mapred.job.dcache.failonupdate. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860880#action_12860880 ] Amareshwari Sriramadasu commented on MAPREDUCE-1288: bq. Why should an old job fail because of what is, essentially, an external event? The current behavior is that Job will fail if a distributed cache file (in use) gets modified on the DFS. Even if the task is localized on a new tracker we should fail the task, is being done through MAPREDUCE-1225. Allen, What should be the behavior for the following use-case? For the case : "some tasks have downloaded a version-0 file and ran successfully; some other tasks cannot find version-0 and they can find only version-1 file, and they use version-1file and run successfully", final Job output would be undefined right? The output generated from the tasks which used version-0 file would be entirely different from output generated from the tasks which used version-1 file. So, I think the current behavior to fail the job, if a file added to distributed-cache gets changed after job submission is correct. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860786#action_12860786 ] Hemanth Yamijala commented on MAPREDUCE-1288: - bq. Why should an old job fail because of what is, essentially, an external event? The job failing is unlikely, right ? Please note that I said tasks fail. I hope someone can clarify (given the two statuses we have - i.e. tasks failed vs killed), whether this condition can lead Hadoop to abort after sufficient number of failures. Even if it does, it should happen that at least one task''s attempts should get scheduled on 4 such nodes, and fail on all four. I am thinking this is unlikely. But let's hope someone (alias Amarsri *smile*) can clarify this. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860339#action_12860339 ] Allen Wittenauer commented on MAPREDUCE-1288: - That sounds like really bad behavior. Why should an old job fail because of what is, essentially, an external event? This still sounds like a blocker to me. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860119#action_12860119 ] Hemanth Yamijala commented on MAPREDUCE-1288: - bq. What happens in the case that the archive file changes in flight. For example, I submit a job using that archive. While my job is running, I notice a bug, remove the old cache file, push a new one to hdfs, and then launch a new invocation of my job. Would the new job get the old cache file because the old job is still running? Allen, the key that identifies a cache file on a tasktracker node is a combination of the URL and the DFS timestamp that is determined when the job is submitted. Hence, the new job would get a new key and hence be localized afresh. This is irrespective of whether the old file was ever localized on the same node or not. I am assuming here that a file upload to DFS to the same URL would modify the timestamp. Further, when this happens, new tasks of the old job that are running on nodes where the localization of the invalid file has already happened, will fail because the localization process for the new tasks will detect the file has changed in-flight. Hope this is correct. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860040#action_12860040 ] Devaraj Das commented on MAPREDUCE-1288: In my earlier analysis, i missed a code path. What I said is not true. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860007#action_12860007 ] Allen Wittenauer commented on MAPREDUCE-1288: - I think the 'in flight' archive is still a problem without this fixed. Correct? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859695#action_12859695 ] Hemanth Yamijala commented on MAPREDUCE-1288: - I am trying to revisit if this bug continues to be a blocker, in light of the decision we have taken to rebase trunk as the next new release. For the default configuration (DefaultTaskController), even though the files are localized only once irrespective of the number of users accessing them, since all the files are localized as the TT user and can be accessed by the TT user, I suppose this is not a problem. For the LinuxTaskController, it seems like a problem. However, since we are rebasing from trunk, maybe the feature of public distributed cache files in MAPREDUCE-744 will cover cases that are otherwise not covered due to this bug. To explain more, either users want to share files, or they don't. If they do, they can use the public distributed cache feature which works fine. If they don't want to share, this bug becomes a non issue. Based on this, I am tending to think this is not a blocker. Thoughts ? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789696#action_12789696 ] Devaraj Das commented on MAPREDUCE-1288: If i am reading the code right, the tasks of the new job would fail on those nodes that localized the old archive and still has a copy of that (the TaskTracker would detect the archive has changed and assuming that the change happened while the new job was running would fail the tasks). This will continue until the archive is purged from the cache and re-localized. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789390#action_12789390 ] Allen Wittenauer commented on MAPREDUCE-1288: - What happens in the case that the archive file changes in flight. For example, I submit a job using that archive. While my job is running, I notice a bug, remove the old cache file, push a new one to hdfs, and then launch a new invocation of my job. Would the new job get the old cache file because the old job is still running? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789160#action_12789160 ] Hemanth Yamijala commented on MAPREDUCE-1288: - Just to be clear, I am *not* disagreeing at all that there's a bug. Or with the assessment that this is a blocker. +1 on both. *smile* > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789153#action_12789153 ] Devaraj Das commented on MAPREDUCE-1288: All I am saying is that irrespective of the file being public or not, in the current codebase, we localize the file exactly once per TaskTracker. On a given tasktracker, users cannot share the same hdfs file as a distributed cache file.. What I thought earlier was that the same file would be localized twice in such a case (in their respective private directories). > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789145#action_12789145 ] Hemanth Yamijala commented on MAPREDUCE-1288: - bq. Even if the entire path were accessible to everyone, If the entire path were accessible to everyone on DFS, there's really no great security for that file. I was just trying to point out that such a case may not even be valid in the context of how MAPREDUCE-856 was approached (i.e we wanted to secure localized files for users). But I am concurring that one could theoretically construct a case where the URI was accessible to a group of users on DFS and since there's no way to securely localize that per group on the TT, this bug is still valid. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789134#action_12789134 ] Devaraj Das commented on MAPREDUCE-1288: Look at this scenario - the URI is hdfs://:/foo/bar/file.txt. Even if the entire path were accessible to everyone, the TaskTracker would localize it exactly once, and in a user's private directory. A second job wishing to access the same file wouldn't be able to do so since the TT wouldn't localize it again.. Does that make sense ? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789128#action_12789128 ] Hemanth Yamijala commented on MAPREDUCE-1288: - I suppose one could argue that if two different users can access the same set of files on the DFS for localization, they are 'public'. But then, you could theoretically construct a use case where there's a 'group' access for some files on DFS and these are localized per user on the task tracker. Is that what we're trying to address ? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789122#action_12789122 ] Vinod K V commented on MAPREDUCE-1288: -- +1 for putting the username also as part of the key. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789115#action_12789115 ] Devaraj Das commented on MAPREDUCE-1288: This issue is talking about how localization happens in 0.21/trunk. There is no "public" or "private" cache files currently. That is getting introduced In MAPREDUCE-744, > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789095#action_12789095 ] Arun C Murthy commented on MAPREDUCE-1288: -- I'm assuming this is done only for 'private' cache files? i.e. public cache files should probably use the 'username' of the TT itself? > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Blocker > Fix For: 0.21.0 > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.