[
https://issues.apache.org/jira/browse/HIVE-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Dimiduk updated HIVE-4627:
-------------------------------
Attachment: 02_hfiles.hql
01_sample.hql
00_tables.ddl
These are my steps to reproduce:
{noformat}
## load the input data
$ wget
http://dumps.wikimedia.org/other/pagecounts-raw/2008/2008-10/pagecounts-20081001-000000.gz
$ hadoop fs -mkdir /tmp/wikistats
$ hadoop fs -put pagecounts-20081001-000000.gz /tmp/wikistats/
## create the necessary tables.
$ hcat -f /tmp/00_tables.ddl
OK
Time taken: 1.886 seconds
OK
Time taken: 0.654 seconds
OK
Time taken: 0.047 seconds
OK
Time taken: 0.115 seconds
## verify
$ hive -e "select * from pagecounts limit 10;"
...
OK
aa Main_Page 4 41431
aa Special:ListUsers 1 5555
aa Special:Listusers 1 1052
...
$ hive -e "select * from pgc limit 10;"
...
OK
aa/Main_Page/20081001-000000 4 41431
aa/Special:ListUsers/20081001-000000 1 5555
aa/Special:Listusers/20081001-000000 1 1052
...
## produce the hfile splits file
$ hive -f /tmp/01_sample.hql
...
OK
Time taken: 54.681 seconds
[hrt_qa] $ hadoop fs -ls /tmp/hbase_splits
Found 1 items
-rwx------ 3 hrt_qa hdfs 270 2013-05-17 19:05 /tmp/hbase_splits
## verify
$ hadoop jar
/usr/lib/hadoop/contrib/streaming/hadoop-streaming-1.2.0.1.3.0.0-104.jar
-libjars /usr/lib/hive/lib/hive-exec-0.11.0.1.3.0.0-104.jar -input
/tmp/hbase_splits -output /tmp/hbase_splits_txt -inputformat
SequenceFileAsTextInputFormat
...
13/05/17 19:08:38 INFO streaming.StreamJob: Output: /tmp/hbase_splits_txt
$ hadoop fs -cat /tmp/hbase_splits_txt/*
01 61 66 2e 71 2f 4d 61 69 6e 5f 50 61 67 65 2f 32 30 30 38 31 30 30 31 2d 30
30 30 30 30 30 00 (null)
01 61 66 2f 31 35 35 30 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30 30 30 00
(null)
01 61 66 2f 32 38 5f 4d 61 61 72 74 2f 32 30 30 38 31 30 30 31 2d 30 30 30 30
30 30 00 (null)
01 61 66 2f 42 65 65 6c 64 3a 31 30 30 5f 31 38 33 30 2e 4a 50 47 2f 32 30 30
38 31 30 30 31 2d 30 30 30 30 30 30 00 (null)
## decoding the first line from utf8 bytes to String yields
"af.q/Main_Page/20081001-000000," which is correct
## generate the hfiles
$ HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.94.6.1.3.0.0-104-security.jar hive -f
/tmp/02_hfiles.hql
{noformat}
> Total ordering of Hive output
> -----------------------------
>
> Key: HIVE-4627
> URL: https://issues.apache.org/jira/browse/HIVE-4627
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.11.0
> Reporter: Nick Dimiduk
> Attachments: 00_tables.ddl, 01_sample.hql, 02_hfiles.hql,
> hive-partitioner.patch
>
>
> I'd like to use Hive to generate HFiles for HBase. I started off by following
> the instructions on the
> [wiki|https://cwiki.apache.org/Hive/hbasebulkload.html], but that took me
> only so far. TotalOrderPartitioning didn't work. That took me to this
> [post|http://stackoverflow.com/questions/13715044/hive-cluster-by-vs-order-by-vs-sort-by]
> which points out that Hive partitions on value instead of key. A patched TOP
> brings me to this error:
> {noformat}
> 2013-05-17 21:00:47,781 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.RuntimeException: Hive Runtime Error while closing operators:
> java.io.IOException: No files found in
> hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
> at
> org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:317)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:532)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: No files found in
> hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:183)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:865)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:597)
> at
> org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:309)
> ... 7 more
> Caused by: java.io.IOException: No files found in
> hdfs://ip-10-191-3-134.ec2.internal:8020/tmp/hive-hrt_qa/hive_2013-05-17_20-58-58_357_6896546413926013201/_task_tmp.-ext-10000/_tmp.000000_0
> at
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:142)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:180)
> ... 11 more
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira