[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779236#comment-13779236
 ] 

Jason Lowe commented on PIG-2672:
---------------------------------

bq. I'm deliberately avoiding in permission checks in this code path. In terms 
of security, I feel that this is no worse than what we have right now.

A shared cache where anyone can write is indeed worse.  Today jars are being 
uploaded to HDFS into a private staging directory where no other normal user 
can interfere.  If the staging directory were to become publicly writeable then 
it becomes trivial to compromise all users trying to run the same pig jar using 
a scheme like [~knoguchi] pointed out.  I don't see how one can accomplish the 
same level of havoc today.  Even if there's a window in the local filesystem 
where one can hijack a jar, that requires access to the same node where the 
user is launching the job.  In the publicly-writeable shared cache scheme, one 
only needs access to HDFS from any node and clients on all nodes using the 
shared cache can be compromised.

Besides malicious users, the shared cache can also be accidentally made 
ineffective by clients.  For example, a user with a restrictive umask (e.g.: 
077) uploads a jar to the shared cache, and all the directories and files were 
created such that others can't read them.  Now because the permissions are 
incorrect any other user can't share the file and any other user's file that 
happens to have the same initial digit(s) in its hash can't be uploaded to the 
shared cache.  And then there's the client that deletes files in-use by other 
clients, breaking their jobs.

In short, shared public caches that are publicly writeable are going to be 
problematic, especially in secure setups.  As such I think there should at 
least be some documentation describing the risks of enabling it and how it 
could be used in a read-only manner for sharing securely, i.e.: shared cache is 
publicly readable but only writeable by admins who manually maintain the 
entries in the shared cache.
                
> Optimize the use of DistributedCache
> ------------------------------------
>
>                 Key: PIG-2672
>                 URL: https://issues.apache.org/jira/browse/PIG-2672
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Aniket Mokashi
>             Fix For: 0.12.0
>
>         Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>    * Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>    * Performance - The jar distribution impacts the job launch time.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to