[ 
https://issues.apache.org/jira/browse/PIG-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574217#action_12574217
 ] 

Amir Youssefi commented on PIG-129:
-----------------------------------

I think it's a good idea to have multiple tmp dirs. Having several physical 
drives is common these days. I brought up the same idea earlier this week as 
next logical step.  

A new feature in Hadoop 0.16.1 will partially address the tmp dir issue. But it 
takes a while for it to go through pipeline and reach users. Currently tmp 
directory is a hot issue for us so we plan to address this in Pig.

I will probably do this in two stages. 

1) ./tmp directory under working directory. This automatically gets cleaned.
2) open discussion on details of using multiple tmp directories (possibly over 
multiple physical drives). We need to take into account cleaning scenarios as 
well.

-Amir

> need to create temp files in the task's working directory
> ---------------------------------------------------------
>
>                 Key: PIG-129
>                 URL: https://issues.apache.org/jira/browse/PIG-129
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Amir Youssefi
>
> Currently, pig creates temp data such is spilled bags in the directory 
> specified by java.io.tmpdir. The problem is that this directory is usually 
> shared by all tasks and can easily run out of space.
> A better approach would be to create this files in the temp dir inside of the 
> taks working directory as these locations usually have much mor space and 
> also they can be hosted on different disks so the performance could be better.
> There are 2 parts to this fix:
> (1) in org.apache.pig.data.DataBag to check if the temp directory exists and 
> create it if not before trying to create the temp file. This is somewhere 
> around line 390 in the code.
> (2) Change the mapred.child.java.opts in hadoop-site.xml to include new value 
> for tmpdir property to point to ./tmp. For instance: 
> <property>
>         <name>mapred.child.java.opts</name>
>         <value>-Xmx1024M -Djava.io.tmpdir="./tmp"</value>
>         <description>arguments passed to child jvms</description>
> </property>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to