[ https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456134#comment-13456134 ]
Ivan Mitic commented on HADOOP-8731: ------------------------------------ Thanks Vinod, these are great comments! Some answers below. {quote}In the case of a real cluster, and with HDFS, the definition of a public dist-cache file is one which is accessible to all users; snd HDFS also has posix style permissions. The method isPublic() eventually is used by the JobClient to figure out which of the user-needed artifacts are public and which are not. So in the distributed-cluster case with DFS, this definition of public-cache doesn't need to change irrespective of whether you have Windows or Linux underneath. {quote} I agree, the JobClient will evaluate whether a file should be public or private. Now, if I understood things correctly, based on whether the file is marked public or private on the JobClient side, it will be later downloaded from DFS to the public or private LFS location on the TT machine. What we are proposing with this change is to change the logic on the JobClient side that determines whether the file is public or private. Given that all files are by default private on Windows, it would be a real challenge for users to upload a file to the public distributed cache if we keep the old model (see my previous comments). Does this make sense? Please do comment, maybe I just didn't understand correctly how DC works. bq. I believe TT absolutely needs to set ugo+rx for dirs containing expanded archives. This is needed to address some of the artifacts which retain permissions from the original bits that a user upload. So let's not move/change that code out of the archives code block. Ah, didn't think of this. Will revert back the original chmod. bq. And for files, can you tell me why the 2nd line in the code-fragment shown below doesn't already do it correctly on Windows? It may in fact be because of some other bug, so asking - is it not enough to set correct permissions on the file itself in case of Windows? You're right, let me debug to see what was the problem here, I made this fix a while back. > Public distributed cache support for Windows > -------------------------------------------- > > Key: HADOOP-8731 > URL: https://issues.apache.org/jira/browse/HADOOP-8731 > Project: Hadoop Common > Issue Type: Bug > Components: filecache > Reporter: Ivan Mitic > Assignee: Ivan Mitic > Attachments: HADOOP-8731-PublicCache.patch > > > A distributed cache file is considered public (sharable between MR jobs) if > OTHER has read permissions on the file and +x permissions all the way up in > the folder hierarchy. By default, Windows permissions are mapped to "700" all > the way up to the drive letter, and it is unreasonable to ask users to change > the permission on the whole drive to make the file public. IOW, it is hardly > possible to have public distributed cache on Windows. > To enable the scenario and make it more "Windows friendly", the criteria on > when a file is considered public should be relaxed. One proposal is to check > whether the user has given EVERYONE group permission on the file only (and > discard the +x check on parent folders). > Security considerations for the proposal: Default permissions on Unix > platforms are usually "775" or "755" meaning that OTHER users can read and > list folders by default. What this also means is that Hadoop users have to > explicitly make the files private in order to make them private in the > cluster (please correct me if this is not the case in real life!). On > Windows, default permissions are "700". This means that by default all files > are private. In the new model, if users want to make them public, they have > to explicitly add EVERYONE group permissions on the file. > TestTrackerDistributedCacheManager fails because of this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira