' path for temporary files

Michael Klatt (JIRA) Tue, 27 Apr 2010 09:17:59 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861444#action_12861444
 ]


Michael Klatt commented on HIVE-1326:
-------------------------------------


The reason the parentFile.delete() call is there is because the 
File.createTempFile method actually creates a file. The code, as it currently 
is, creates a temporary directory to hold the rowcontainer file and I made the 
smallest change possible to continue to support this behavior.

Looking at the code, it appears that the createTempFile mechanism is used 
several lines down to actually create the temporary file (within the new 
temporary directory). I'm not sure why a temporary directory is created first, 
but I'll submit a new patch which doesn't try to create a temporary directory 
at all.


> RowContainer uses hard-coded '/tmp/' path for temporary files
> -------------------------------------------------------------
>
>                 Key: HIVE-1326
>                 URL: https://issues.apache.org/jira/browse/HIVE-1326
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>            Reporter: Michael Klatt
>         Attachments: rowcontainer.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 10000000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 11000000 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 12000000 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 13000000 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 14000000 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 15000000 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 16000000 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 17000000 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 18000000 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 19000000 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 20000000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>       at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>       at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>       at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>       at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>       at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
>       at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>       at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>       at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Caused by: java.io.IOException: No space left on device
>       at java.io.FileOutputStream.writeBytes(Native Method)
>       at java.io.FileOutputStream.write(FileOutputStream.java:260)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
>       ... 22 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

Reply via email to