[jira] Updated: (HIVE-1317) CombineHiveInputFormat throws exception when partition name contains special characters to URI

2010-04-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1317:
-

Attachment: HIVE-1317.2.patch

Uploading a new patch HIVE-1317.2.patch. This fixes the test. 

Also talk with Namit offline, the CombineHiveInputFormat will get all paths 
under a partition directory due to CombineFileInputFormat.createPool(job, 
CombineFilter), in which CombineFilter will accept all files under the 
partition directory. That's why in CombineHiveInputFormat will we get the full 
path names from FileInputFormat.getInputPaths(). 


> CombineHiveInputFormat throws exception when partition name contains special 
> characters to URI
> --
>
> Key: HIVE-1317
> URL: https://issues.apache.org/jira/browse/HIVE-1317
> Project: Hadoop Hive
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ning Zhang
>Assignee: Ning Zhang
> Fix For: 0.6.0
>
> Attachments: HIVE-1317.2.patch, HIVE-1317.patch
>
>
> If a partition name contains characters such as ':' and '|' which have 
> special meaning in URI (hdfs uses URI internally for Path), 
> CombineHiveInputFormat throws an exception. URI was created in 
> CombineHiveInputFormat to compare a path belongs to a partition in 
> partitionToPathInfo. We should bypass URI creation by just string 
> comparisons. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread Ning Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang resolved HIVE-1326.
--

Fix Version/s: 0.6.0
   Resolution: Fixed

Committed. Thanks Michael!

> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Fix For: 0.6.0
>
> Attachments: rowcontainer.patch, rowcontainer_v2.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>   at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Caused by: java.io.IOException: No space left on device
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:260)
>   at 
> org.apache.hadoop.fs.Raw

[jira] Resolved: (HIVE-377) Some ANT jars should be included into hive

2010-04-27 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-377.
--

Resolution: Won't Fix

With newer releases of Hadoop & Hive Jetty and Ant are packaged differently and 
this is no longer and issue. 

> Some ANT jars should be included into hive
> --
>
> Key: HIVE-377
> URL: https://issues.apache.org/jira/browse/HIVE-377
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 0.3.0, 0.6.0
>Reporter: Edward Capriolo
> Fix For: 0.4.2
>
>
> The WEB UI requires
> HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant.jar
> HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant-launcher.jar
> Right now the start script does this.
> {noformat}
>  #hwi requires ant jars
>  # if [ "$ANT_LIB" = "" ] ; then
>  #   ANT_LIB=/opt/ant/libs
>  # fi
>  # for f in ${ANT_LIB}/*.jar; do
>  #   if [[ ! -f $f ]]; then
>  # continue;
>  #   fi
>  #   HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
>  # done
> {noformat}
> Can we add these jars? This will add 1.4 MB to the hive. If we do not want to 
> add these I would like to make the startup script fail if the environment 
> variable is not correct.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-04-27 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861514#action_12861514
 ] 

Paul Yang commented on HIVE-1115:
-

Not quite sure if this will be completed this quarter - I believe 
MAPREDUCE-1501 was helpful in handling a similar use case?

> optimize combinehiveinputformat in presence of many partitions
> --
>
> Key: HIVE-1115
> URL: https://issues.apache.org/jira/browse/HIVE-1115
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
>
> A query like :
> select ..  from T where ...
> where T contains a very large number of partitions does not work very well 
> with CombineHiveInputFomat.
> A pool is created per directory, which leads to a high number of mappers.
> In case all partitions share the same operator tree, and the same partition 
> description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1115) optimize combinehiveinputformat in presence of many partitions

2010-04-27 Thread Matt Pestritto (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861508#action_12861508
 ] 

Matt Pestritto commented on HIVE-1115:
--

Any eta on this issue for resolution ?  There hasn't been any activity in a 
while and it would be significant performance increase in our environment.  
Thanks

> optimize combinehiveinputformat in presence of many partitions
> --
>
> Key: HIVE-1115
> URL: https://issues.apache.org/jira/browse/HIVE-1115
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Paul Yang
>
> A query like :
> select ..  from T where ...
> where T contains a very large number of partitions does not work very well 
> with CombineHiveInputFomat.
> A pool is created per directory, which leads to a high number of mappers.
> In case all partitions share the same operator tree, and the same partition 
> description, only a single pool should be created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861467#action_12861467
 ] 

Ning Zhang commented on HIVE-1326:
--

Michael, the temporary directory is needed for a conditional task to handle 
skewed joins (that's why the JobConf is cloned but with a different path). The 
idea here is that whenever there many too many rows for a particular job key to 
be held in memory, they will be first write to local disk via RowContainer. If 
there are still too many rows to local disk, they will be write to a DFS 
location and a contitional task will be triggered to handle this skewed key. So 
if we create a temp directory in local disk and let all mappers write their 
temp files in that directory, then later when we want to move the data to HDFS, 
we just move the whole directory instead of moving individual files.

+1 on v2. Will commit if tests pass. 


> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Attachments: rowcontainer.patch, rowcontainer_v2.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hado

[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread Michael Klatt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861449#action_12861449
 ] 

Michael Klatt commented on HIVE-1326:
-


Hmm, on second thought, the parentFile (directory) is referenced in several 
places.  I'm not familiar enough with Hive to know what these lines do:

  HiveConf.setVar(jobCloneUsingLocalFs,
  HiveConf.ConfVars.HADOOPMAPREDINPUTDIR,
  org.apache.hadoop.util.StringUtils.escapeString(parentFile
  .getAbsolutePath()));

If it weren't for these lines I could remove the parentFile variable all 
together.



> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Attachments: rowcontainer.patch, rowcontainer_v2.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>   at 
> org.apache.hadoop.hive.ql.exec.Exe

[jira] Updated: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread Michael Klatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Klatt updated HIVE-1326:


Attachment: rowcontainer_v2.patch

> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Attachments: rowcontainer.patch, rowcontainer_v2.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>   at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Caused by: java.io.IOException: No space left on device
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:260)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.ja

[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread Michael Klatt (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861444#action_12861444
 ] 

Michael Klatt commented on HIVE-1326:
-


The reason the parentFile.delete() call is there is because the 
File.createTempFile method actually creates a file. The code, as it currently 
is, creates a temporary directory to hold the rowcontainer file and I made the 
smallest change possible to continue to support this behavior.

Looking at the code, it appears that the createTempFile mechanism is used 
several lines down to actually create the temporary file (within the new 
temporary directory). I'm not sure why a temporary directory is created first, 
but I'll submit a new patch which doesn't try to create a temporary directory 
at all.


> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Attachments: rowcontainer.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.ja

[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-27 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861370#action_12861370
 ] 

He Yongqiang commented on HIVE-1326:


Agreed with Ning, looks fine to me.

> RowContainer uses hard-coded '/tmp/' path for temporary files
> -
>
> Key: HIVE-1326
> URL: https://issues.apache.org/jira/browse/HIVE-1326
> Project: Hadoop Hive
>  Issue Type: Bug
> Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
> but that doesn't seem relevant.
>Reporter: Michael Klatt
> Attachments: rowcontainer.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, 
> and we encountered a problem when a query used the RowContainer class and 
> filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
> class putting temporary files in the '/tmp/' path instead of using the 
> configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
> temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
> rows: used memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
> rows: used memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
> rows: used memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
> rows: used memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
> rows: used memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
> rows: used memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
> rows: used memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
> rows: used memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
> rows: used memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
> rows: used memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
> rows: used memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
> java.io.IOException: No space left on device
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
>   at 
> org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
>   at 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
>   at 
> org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
>   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
>   at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Caused by: java.io.IOException: No space left on device
>   at java.io.FileOutputStream.writeBytes(Native Method)
>   at java.io.FileOutputStream.write(FileOutputStream.java:260)
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStrea