[
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779085#comment-13779085
]
Aniket Mokashi commented on PIG-2672:
-------------------------------------
[~rohini], from the current code, we have-
{code}
Path dst = new
Path(FileLocalizer.getTemporaryPath(pigContext).toUri().getPath(), suffix);
{code}
Hence, files are (by default) copied to /tmp/temp-<random>/. I do not see a way
to configure it to a relative path, but I might be wrong.
bq. UserEvil can figure out what the shared hdfs path is since he has access to
the local file.
This is true even today where UserEvil can look into jobconf to find the
location of jars and replace whatever jars if wanted. Even if they are
protected like Rohini explained earlier, still the protection is coming from
HDFS and not pig.
I'm deliberately avoiding in permission checks in this code path. In terms of
security, I feel that this is no worse than what we have right now.
Next steps-
1. Address code review comments from RB and submit a fresh patch.
2. Run this for several jobs in practice and ensure there are no bad/side
effects.
3. [~cheolsoo], can you please help me with e2e for this?
4. Open a documentation jira and explain how this works in pig docs.
Anything else I missed?
> Optimize the use of DistributedCache
> ------------------------------------
>
> Key: PIG-2672
> URL: https://issues.apache.org/jira/browse/PIG-2672
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Aniket Mokashi
> Fix For: 0.12.0
>
> Attachments: PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds
> them to DistributedCache for each job launched. This is inefficient in terms
> of
> * Space - The jars are distributed to task trackers for every job taking
> up lot of local temporary space in tasktrackers.
> * Performance - The jar distribution impacts the job launch time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira