[ 
https://issues.apache.org/jira/browse/HADOOP-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456057#comment-13456057
 ] 

Vinod Kumar Vavilapalli commented on HADOOP-8731:
-------------------------------------------------

Apologies for repeating the questions, I overlooked your answers.

There are two cases:
 - In the case of a real cluster, and with HDFS, the definition of a public 
dist-cache file is one which is accessible to all users; snd HDFS also has 
posix style permissions. The method isPublic() eventually is used by the 
JobClient to figure out which of the user-needed artifacts are public and which 
are not. So in the distributed-cluster case with DFS, this definition of 
public-cache doesn't need to change irrespective of whether you have Windows or 
Linux underneath.
 - If you are talking of distributed MR cluster working on a local-filesystem, 
yes your changes will be needed, but that mode is not a supported setup anyways 
and will most likely need many more changes besides yours.

Regarding the permissions related changes:
 - I believe TT absolutely needs to set ugo+rx for dirs containing expanded 
archives. This is needed to address some of the artifacts which retain 
permissions from the original bits that a user upload. So let's not move/change 
that code out of the archives code block.
 - And for files, can you tell me why the 2nd line in the code-fragment shown 
below doesn't already do it correctly on Windows? It may in fact be because of 
some other bug, so asking - is it not enough to set correct permissions on the 
file itself in case of Windows? 
{code}
 ...
 sourceFs.copyToLocalFile(sourcePath, workFile);
 localFs.setPermission(workFile, permission);
 if (isArchive) {
 ...
{code}
                
> Public distributed cache support for Windows
> --------------------------------------------
>
>                 Key: HADOOP-8731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8731
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: filecache
>            Reporter: Ivan Mitic
>            Assignee: Ivan Mitic
>         Attachments: HADOOP-8731-PublicCache.patch
>
>
> A distributed cache file is considered public (sharable between MR jobs) if 
> OTHER has read permissions on the file and +x permissions all the way up in 
> the folder hierarchy. By default, Windows permissions are mapped to "700" all 
> the way up to the drive letter, and it is unreasonable to ask users to change 
> the permission on the whole drive to make the file public. IOW, it is hardly 
> possible to have public distributed cache on Windows. 
> To enable the scenario and make it more "Windows friendly", the criteria on 
> when a file is considered public should be relaxed. One proposal is to check 
> whether the user has given EVERYONE group permission on the file only (and 
> discard the +x check on parent folders).
> Security considerations for the proposal: Default permissions on Unix 
> platforms are usually "775" or "755" meaning that OTHER users can read and 
> list folders by default. What this also means is that Hadoop users have to 
> explicitly make the files private in order to make them private in the 
> cluster (please correct me if this is not the case in real life!). On 
> Windows, default permissions are "700". This means that by default all files 
> are private. In the new model, if users want to make them public, they have 
> to explicitly add EVERYONE group permissions on the file. 
> TestTrackerDistributedCacheManager fails because of this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to