Re: Setting s3 credentials in cloudera
With the same credentials I am able to download the s3 file to my local filesystem. On Tue, Apr 22, 2014 at 11:17 AM, Kishore kumar kish...@techdigita.inwrote: No, I am running in cli. On Mon, Apr 21, 2014 at 8:43 PM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: You mention cloudera, are you trying to execute the query from HUE? That requires altering the setting for HUE and not HIVE. On Mon, Apr 21, 2014 at 11:12 AM, j.barrett Strausser j.barrett.straus...@gmail.com wrote: Hope those aren't you actual credentials. On Mon, Apr 21, 2014 at 11:05 AM, Kishore kumar kish...@techdigita.inwrote: I Edited Cluster-wide Configuration Safety Valve for core-site.xml in cm, and specified as below, but still the problem is same. property namefs.s3.awsAccessKeyId/name valueAKIAJNIM5P2SASWJPHSA/value /property property namefs.s3.awsSecretAccessKey/name valueBN1hkKD7JY4LGGNbjxmnFE0ehs12vXmP44GCKV2N/value /property FAILED: Error in metadata: MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Thanks, Kishore. On Mon, Apr 21, 2014 at 8:17 PM, Kishore kumar kish...@techdigita.inwrote: I set the credentials from hive command line, still I am getting the error. please help me. hive set fs.s3.awsAccessKeyId = x; hive set fs.s3.awsSecretAccessKey = xxx; FAILED: Error in metadata: MetaException(message:java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Thanks, Kishore. On Mon, Apr 21, 2014 at 7:33 PM, Kishore kumar kish...@techdigita.inwrote: Hi Experts, I am trying to create table against my s3 file, I faced the below issue, where to set these credentials in clouderamanager4.8. I got this link ( http://community.cloudera.com/t5/Cloudera-Manager-Installation/AWS-Access-Key-ID-and-Secret-Access-Key-must-be-specified-as-the/td-p/495) after some research but please explain me clearly after edited Cluster-wide Configuration Safety Valve for core-site.xml how to specify the values. -- Thanks, *Kishore * -- -- -- https://github.com/bearrito -- -- *Kishore Kumar* ITIM
RE: Kerberized Hive | Remote Access using Keytab
Hi All, Can someone provide some information on below problem? Kind Regards, Keshav C Savant From: Savant, Keshav [mailto:keshav.c.sav...@fisglobal.com] Sent: Friday, April 18, 2014 3:52 PM To: user@hive.apache.org Subject: Kerberized Hive | Remote Access using Keytab Hi All, I have successfully Kerberized the CDH5 Hive. Now I can do a kinit then issue hive queries. Next I wanted to access hive remotely from standalone java client using keytab file so that kinit (or credential prompt) can be avoided. I have written a java code with following lines (based on input from cdh-user google grouphttps://urldefense.proofpoint.com/v1/url?u=https://groups.google.com/a/cloudera.org/forum/%23%21topic/cdh-user/S7nPFx0w90Uk=%2FbkpAUdJWZuiTILCq%2FFnQg%3D%3D%0Ar=n8%2FsNJ1paZ2bqAHakATIk84Ym2qkN8Z0Oh2DW2luaMQ%3D%0Am=5bmaY2O6gxvhGmAlWv5Rm1CE0ohlHdXuWX97e3K5SX4%3D%0As=f8d620a00927b0d175986961186dd09268d50bd540d4340e74c68f8ba0a2cc53) to solve the above problem, but after that I am getting GSS initiate failed exception. Configuration conf = new Configuration(); conf.addResource(new java.io.FileInputStream(/installer/hive_jdbc/core-site.xml)); //file placed at this path SecurityUtil.login(conf,/path/to/my/keytab/file/user.keytab, user@domain); I have also posted the same problem on thishttps://urldefense.proofpoint.com/v1/url?u=https://groups.google.com/a/cloudera.org/forum/%23%21topic/cdh-user/S7nPFx0w90Uk=%2FbkpAUdJWZuiTILCq%2FFnQg%3D%3D%0Ar=n8%2FsNJ1paZ2bqAHakATIk84Ym2qkN8Z0Oh2DW2luaMQ%3D%0Am=5bmaY2O6gxvhGmAlWv5Rm1CE0ohlHdXuWX97e3K5SX4%3D%0As=f8d620a00927b0d175986961186dd09268d50bd540d4340e74c68f8ba0a2cc53 URL, sample code logs are posted here. As per the apache hive wiki on thishttps://urldefense.proofpoint.com/v1/url?u=https://cwiki.apache.org/confluence/display/Hive/HiveServer2%26%2343%3BClients%23HiveServer2Clients-JDBCClientSetupforaSecureClusterk=%2FbkpAUdJWZuiTILCq%2FFnQg%3D%3D%0Ar=n8%2FsNJ1paZ2bqAHakATIk84Ym2qkN8Z0Oh2DW2luaMQ%3D%0Am=5bmaY2O6gxvhGmAlWv5Rm1CE0ohlHdXuWX97e3K5SX4%3D%0As=eba097fe03762745b0271351811bc5ce726f5d5cc4dcb5e6137f6eb67cdff4b7 page, a valid ticket needs to be there in ticket cache for hitting a kerberized hive. Can I bypass this use a keytab for hitting kerberized hive from a standalone java program? Kindly provide some input/pointers/examples to solve this. Kind regards, Keshav C Savant _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you. _ The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: Hive 0.13.0 - IndexOutOfBounds Exception
Prasanth, The error seems to occur with just about any table. I mocked up a very simple table to illustrate the problem (including input data, etc.) to make this easy to repeat. hive create table loading_data_0 (A smallint, B smallint) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; hive create table data (A smallint, B smallint) partitioned by (range int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\orc.compress\ = \SNAPPY\, \orc.index\ = \true\); [root@server ~]# cat test.input 123|436 423|426 223|456 923|486 023|406 hive load data inpath '/test.input' into table loading_data_0 partition (range=123); [root@server scripts]# hive -e describe data; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.508 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.422 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e describe loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.511 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.37 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.564 seconds OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: loading_data_0 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: smallint), b (type: smallint), range (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint), _col1 (type: smallint) sort order: Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Stage: Stage-0 Move Operator tables: partition: range replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Time taken: 0.913 seconds, Fetched: 45 row(s) [root@server]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.513 seconds Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Job = job_1398130933303_1467, Tracking URL = http://server:8088/proxy/application_1398130933303_1467/ Kill Command = /opt/hadoop/latest-hadoop/bin/hadoop job -kill job_1398130933303_1467 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-04-22 11:33:26,984 Stage-1 map = 0%, reduce = 0% 2014-04-22 11:33:51,833 Stage-1 map = 100%, reduce = 100% Ended Job = job_1398130933303_1467 with
How to get hdfs zip files in localfilesystem
Hi Experts, Hive query result stored in hdfs with 75 zip files, I merged them and get into the local filesystem with '-getmerge', but I am unable to see the data after i unzipped the file, anything i am missing ? please help me. -- *Kishore *
Re: How to get hdfs zip files in localfilesystem
I am getting the message if i run below command to see the contents in merged file, please help how to do this, if I am storing the query result in local filesystem I can able to see the contents. # less TheCombinedResultOfTheJob.txt TheCombinedResultOfTheJob.txt may be a binary file. See it anyway? y when i enter y the result is 0^A0^A0^A11^A0^A12^A416^A0.0 0^A0^A0^A11^A38^A12^A87^A0.0 0^A0^A0^A12^A53^A11^A1^A0.0 0^A0^A0^A12^A72^A11^A30^A0.0 0^A0^A0^A12^A357^A11^A12^A0.0 0^A0^A0^A12^A395^A11^A2^A0.0 0^A0^A0^A12^A547^A11^A9^A0.0 On Tue, Apr 22, 2014 at 8:32 PM, Kishore kumar kish...@techdigita.inwrote: Hi Experts, Hive query result stored in hdfs with 75 zip files, I merged them and get into the local filesystem with '-getmerge', but I am unable to see the data after i unzipped the file, anything i am missing ? please help me. -- *Kishore *
Re: Hive 0.13.0 - IndexOutOfBounds Exception
Prasanth, Was this additional information sufficient? This is a large road block to our adopting Hive 0.13.0. Regards, Bryan Jeffrey On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey bryan.jeff...@gmail.comwrote: Prasanth, The error seems to occur with just about any table. I mocked up a very simple table to illustrate the problem (including input data, etc.) to make this easy to repeat. hive create table loading_data_0 (A smallint, B smallint) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; hive create table data (A smallint, B smallint) partitioned by (range int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\orc.compress\ = \SNAPPY\, \orc.index\ = \true\); [root@server ~]# cat test.input 123|436 423|426 223|456 923|486 023|406 hive load data inpath '/test.input' into table loading_data_0 partition (range=123); [root@server scripts]# hive -e describe data; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.508 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.422 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e describe loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.511 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.37 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.564 seconds OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: loading_data_0 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: smallint), b (type: smallint), range (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint), _col1 (type: smallint) sort order: Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Stage: Stage-0 Move Operator tables: partition: range replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Time taken: 0.913 seconds, Fetched: 45 row(s) [root@server]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.513 seconds Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Job = job_1398130933303_1467, Tracking URL =
Re: Hive 0.13.0 - IndexOutOfBounds Exception
Thanks Bryan. This is more than sufficient. As a workaround, can you try setting hive.optimize.sort.dynamic.partition=false and see if it helps? In the meantime, I will diagnose the issue. Thanks Prasanth Jayachandran On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey bryan.jeff...@gmail.com wrote: Prasanth, Was this additional information sufficient? This is a large road block to our adopting Hive 0.13.0. Regards, Bryan Jeffrey On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey bryan.jeff...@gmail.com wrote: Prasanth, The error seems to occur with just about any table. I mocked up a very simple table to illustrate the problem (including input data, etc.) to make this easy to repeat. hive create table loading_data_0 (A smallint, B smallint) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; hive create table data (A smallint, B smallint) partitioned by (range int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\orc.compress\ = \SNAPPY\, \orc.index\ = \true\); [root@server ~]# cat test.input 123|436 423|426 223|456 923|486 023|406 hive load data inpath '/test.input' into table loading_data_0 partition (range=123); [root@server scripts]# hive -e describe data; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.508 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.422 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e describe loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.511 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.37 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.564 seconds OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: loading_data_0 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: smallint), b (type: smallint), range (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint), _col1 (type: smallint) sort order: Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Stage: Stage-0 Move Operator tables: partition: range replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Time taken: 0.913 seconds, Fetched: 45 row(s) [root@server]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.513 seconds Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average
Re: Hive 0.13.0 - IndexOutOfBounds Exception
Bryan, This issue is related to https://issues.apache.org/jira/browse/HIVE-6883 The workaround for this issue is to disable hive.optimize.sort.dynamic.partition optimization by setting it to false. We found this issue very late (towards the end of 0.13 release) and so wasn’t included in hive 0.13. It will go into the next patch release/next release. I will request for a backport to hive 0.13 source as well. Thanks Prasanth Jayachandran On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey bryan.jeff...@gmail.com wrote: Prasanth, Was this additional information sufficient? This is a large road block to our adopting Hive 0.13.0. Regards, Bryan Jeffrey On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey bryan.jeff...@gmail.com wrote: Prasanth, The error seems to occur with just about any table. I mocked up a very simple table to illustrate the problem (including input data, etc.) to make this easy to repeat. hive create table loading_data_0 (A smallint, B smallint) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; hive create table data (A smallint, B smallint) partitioned by (range int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\orc.compress\ = \SNAPPY\, \orc.index\ = \true\); [root@server ~]# cat test.input 123|436 423|426 223|456 923|486 023|406 hive load data inpath '/test.input' into table loading_data_0 partition (range=123); [root@server scripts]# hive -e describe data; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.508 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.422 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e describe loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.511 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.37 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.564 seconds OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: loading_data_0 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: smallint), b (type: smallint), range (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint), _col1 (type: smallint) sort order: Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Stage: Stage-0 Move Operator tables: partition: range replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Time taken: 0.913 seconds, Fetched: 45 row(s) [root@server]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; insert into table data partition (range) select * from loading_data_0; Logging initialized using
Re: Hive 0.13.0 - IndexOutOfBounds Exception
Prasanth, Thank you for the help. It would not have occurred to me to look at partition sort and order issues from that dump. I may just apply the patch to my copy of 13. Regards, Bryan Jeffrey On Apr 22, 2014 2:41 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Bryan, This issue is related to https://issues.apache.org/jira/browse/HIVE-6883 The workaround for this issue is to disable hive.optimize.sort.dynamic.partition optimization by setting it to false. We found this issue very late (towards the end of 0.13 release) and so wasn’t included in hive 0.13. It will go into the next patch release/next release. I will request for a backport to hive 0.13 source as well. Thanks Prasanth Jayachandran On Apr 22, 2014, at 10:36 AM, Bryan Jeffrey bryan.jeff...@gmail.com wrote: Prasanth, Was this additional information sufficient? This is a large road block to our adopting Hive 0.13.0. Regards, Bryan Jeffrey On Tue, Apr 22, 2014 at 7:41 AM, Bryan Jeffrey bryan.jeff...@gmail.comwrote: Prasanth, The error seems to occur with just about any table. I mocked up a very simple table to illustrate the problem (including input data, etc.) to make this easy to repeat. hive create table loading_data_0 (A smallint, B smallint) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; hive create table data (A smallint, B smallint) partitioned by (range int) clustered by (A) sorted by (A, B) into 8 buckets stored as orc tblproperties (\orc.compress\ = \SNAPPY\, \orc.index\ = \true\); [root@server ~]# cat test.input 123|436 423|426 223|456 923|486 023|406 hive load data inpath '/test.input' into table loading_data_0 partition (range=123); [root@server scripts]# hive -e describe data; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.508 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.422 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e describe loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.511 seconds OK a smallint b smallint range int # Partition Information # col_name data_type comment range int Time taken: 0.37 seconds, Fetched: 8 row(s) [root@server scripts]# hive -e set hive.exec.dynamic.partition.mode=nonstrict; set hive.enforce.sorting = true; set mapred.job.queue.name=orc_queue; explain insert into table data partition (range) select * from loading_data_0; Logging initialized using configuration in /opt/hadoop/latest-hive/conf/hive.log4j OK Time taken: 0.564 seconds OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: loading_data_0 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: a (type: smallint), b (type: smallint), range (type: int) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col2 (type: int), -1 (type: int), _col0 (type: smallint), _col1 (type: smallint) sort order: Map-reduce partition columns: _col2 (type: int) Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: smallint), _col1 (type: smallint), _col2 (type: int) Reduce Operator Tree: Extract Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 40 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Stage: Stage-0 Move Operator tables: partition: range replace: false table: input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde name: data Time taken: 0.913 seconds, Fetched: 45 row(s)
Re: How to get hdfs zip files in localfilesystem
This looks like the file has Ctrl A (default delimiter) Try this CTRLA=$(echo -e \x01”); sed 's/'${CTRLA}'/\t/g’ TheCombinedResultOfTheJob.txt TheCombinedResultOfTheJob.tsv; less TheCombinedResultOfTheJob.tsv Thanks Warm Regards Sanjay From: Kishore kumar kish...@techdigita.inmailto:kish...@techdigita.in Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Tuesday, April 22, 2014 at 9:38 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: How to get hdfs zip files in localfilesystem I am getting the message if i run below command to see the contents in merged file, please help how to do this, if I am storing the query result in local filesystem I can able to see the contents. # less TheCombinedResultOfTheJob.txt TheCombinedResultOfTheJob.txt may be a binary file. See it anyway? y when i enter y the result is 0^A0^A0^A11^A0^A12^A416^A0.0 0^A0^A0^A11^A38^A12^A87^A0.0 0^A0^A0^A12^A53^A11^A1^A0.0 0^A0^A0^A12^A72^A11^A30^A0.0 0^A0^A0^A12^A357^A11^A12^A0.0 0^A0^A0^A12^A395^A11^A2^A0.0 0^A0^A0^A12^A547^A11^A9^A0.0 On Tue, Apr 22, 2014 at 8:32 PM, Kishore kumar kish...@techdigita.inmailto:kish...@techdigita.in wrote: Hi Experts, Hive query result stored in hdfs with 75 zip files, I merged them and get into the local filesystem with '-getmerge', but I am unable to see the data after i unzipped the file, anything i am missing ? please help me. -- Kishore
Finding Max of a column without using any Aggregation functions
Hey guys TABLE=STUDENT COLUMN=SCORE U want to find the max value in the column without using any aggregation functions. Its easy in a RDB context but I was trying to get a solution in Hive (clearly I have some spare time on my hands - LOL) select nfr.score from student nfr left outer join (select a.score as fra, b.score as frb from (select '1' as dummy, score from student ) a join (select '1' as dummy, score from student ) b ON a.dummy = b.dummy where a.score b.score ) frab on frab.fra=nfr.score where frab.fra is null Thanks Warm Regards Sanjay
create table question
I use hadoop 2.2.0 and hive 0.13.0, I want to create a table from an existing file, states.hql is follows: CREATE EXTERNAL TABLE states(abbreviation string, full_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'tmp/states' ; [hadoop@master ~]$ hadoop fs -ls 14/04/22 20:17:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - hadoop supergroup 0 2014-04-22 20:02 tmp [hadoop@master ~]$ hadoop fs -put states.txt tmp/states [hadoop@master ~]$ hadoop fs -ls tmp/states 14/04/22 20:17:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 2 hadoop supergroup654 2014-04-22 20:02 tmp/states/states.txt Then I execute states.hql [hadoop@master ~]$ hive -f states.hql 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Logging initialized using configuration in jar:file:/home/software/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://master:9000./tmp/states) It raise following error,why? How to correct it? 2014-04-22 20:12:03,907 INFO [main]: exec.DDLTask (DDLTask.java:createTable(4074)) - Default to LazySimpleSerDe for table states 2014-04-22 20:12:05,147 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(624)) - 0: create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 2014-04-22 20:12:05,147 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(306)) - ugi=hadoop ip=unknown-ip-addr cmd=create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 2014-04-22 20:12:05,196 ERROR [main]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(143)) - MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI:
Re: create table question
in the ql, you set relative path tmp/states, according to the error message, you need to set absolute path On Wed, Apr 23, 2014 at 11:23 AM, EdwardKing zhan...@neusoft.com wrote: I use hadoop 2.2.0 and hive 0.13.0, I want to create a table from an existing file, states.hql is follows: CREATE EXTERNAL TABLE states(abbreviation string, full_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'tmp/states' ; [hadoop@master ~]$ hadoop fs -ls 14/04/22 20:17:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - hadoop supergroup 0 2014-04-22 20:02 tmp [hadoop@master ~]$ hadoop fs -put states.txt tmp/states [hadoop@master ~]$ hadoop fs -ls tmp/states 14/04/22 20:17:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 2 hadoop supergroup654 2014-04-22 20:02 tmp/states/states.txt Then I execute states.hql [hadoop@master ~]$ hive -f states.hql 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Logging initialized using configuration in jar:file:/home/software/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://master:9000./tmp/states) It raise following error,why? How to correct it? 2014-04-22 20:12:03,907 INFO [main]: exec.DDLTask (DDLTask.java:createTable(4074)) - Default to LazySimpleSerDe for table states 2014-04-22 20:12:05,147 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(624)) - 0: create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 2014-04-22 20:12:05,147 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(306)) - ugi=hadoop ip=unknown-ip-addr cmd=create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 2014-04-22 20:12:05,196 ERROR [main]:
Re: create table question
For example if ur name node was hadoop_name_nodeIP:8020 (verify this thru your browser http://hadoop_name_nodeIP:50070) Modified Create Table == CREATE EXTERNAL TABLE states(abbreviation string, full_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'hdfs://hp8300one:8020/tmp/states' ; From: Shengjun Xin s...@gopivotal.com To: user@hive.apache.org Sent: Tuesday, April 22, 2014 8:58 PM Subject: Re: create table question in the ql, you set relative path tmp/states, according to the error message, you need to set absolute path On Wed, Apr 23, 2014 at 11:23 AM, EdwardKing zhan...@neusoft.com wrote: I use hadoop 2.2.0 and hive 0.13.0, I want to create a table from an existing file, states.hql is follows: CREATE EXTERNAL TABLE states(abbreviation string, full_name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 'tmp/states' ; [hadoop@master ~]$ hadoop fs -ls 14/04/22 20:17:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - hadoop supergroup 0 2014-04-22 20:02 tmp [hadoop@master ~]$ hadoop fs -put states.txt tmp/states [hadoop@master ~]$ hadoop fs -ls tmp/states 14/04/22 20:17:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 2 hadoop supergroup 654 2014-04-22 20:02 tmp/states/states.txt Then I execute states.hql [hadoop@master ~]$ hive -f states.hql 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 14/04/22 20:11:47 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed Logging initialized using configuration in jar:file:/home/software/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://master:9000./tmp/states) It raise following error,why? How to correct it? 2014-04-22 20:12:03,907 INFO [main]: exec.DDLTask (DDLTask.java:createTable(4074)) - Default to LazySimpleSerDe for table states 2014-04-22 20:12:05,147 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(624)) - 0: create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format= , field.delim= }), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{EXTERNAL=TRUE}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 2014-04-22 20:12:05,147 INFO [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(306)) - ugi=hadoop ip=unknown-ip-addr cmd=create_table: Table(tableName:states, dbName:default, owner:hadoop, createTime:1398222724, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:abbreviation, type:string, comment:null), FieldSchema(name:full_name, type:string, comment:null)], location:tmp/states, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false,