Only one active reducer in YARN
Hi, I just moved from MR1 to YARN (CDH 4.x to CDH 5.2). After this, I see that all the loading jobs which are mostly like the following are running really slow. insert overwrite table desttable partition (partname) select * from sourcetable From what I can see, even if I set the number of reducers to 500, it runs 500 reducers and 498 of them finishes in a minute and all of them have the following log entry. Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Then it runs 1 or 2 reducers which do all the work. The data for the whole table passes through these 1 or 2 reducers which runs forever.Also, the output data for each partition is merged into one huge file. MR1 used to write smaller files and all the reducers used to do the work.Now it is different.I tried setting set hive.merge.mapfiles=false;set hive.merge.mapredfiles=false; It doesn't seem to help. Here are my settings. set hive.exec.dynamic.partition.mode=nonstrict;set parquet.compression=gzip;SET mapred.output.compression.type=BLOCK;SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;SET hive.exec.compress.intermediate=true;SET hive.exec.compress.output=true;SET hive.exec.dynamic.partition=true;set io.sort.mb=128;set mapred.map.child.java.opts=-Xmx4096M;set dfs.block.size=1073741824;SET hive.exec.reducers.bytes.per.reducer=10;set hive.merge.mapfiles=false;set hive.merge.mapredfiles=false;SET hive.merge.size.per.task=1073741824; SET mapreduce.task.io.sort.mb=256;SET mapreduce.map.output.compress=true;SET mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET mapred.output.fileoutputformat.compress=true;set mapreduce.output.fileoutputformat.compress=true;SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET mapreduce.output.fileoutputformat.compress.type=BLOCK;SET io.seqfile.compression.type=BLOCK;SET mapred.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET mapreduce.job.reduces=512; I tried SnappyCodec too. Results are not much different. Please let me know if anyone has any ideas on how to handle this. Regards,Murali.
load TPCH HBase tables through Hive
hi, folks, I am using the HBaseintergration feature from hive ( https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) to load TPCH tables into HBase. Hive 0.13 and HBase 0.98.6. The load works well. However, as documented here: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-KeyUniqueness. The key uniqueness prevents me from loading all 'lineitem' rows. As 'lineitem' table is using L_ORDERKEY, L_LINENUMBER as compound primary key. If I only mapped to 'L_ORDERKEY as hbase key(aka, row #). Many rows will get overwritten. Any suggestion? someone on this list must go through this already. :-). Thanks BTW, here is my hive ddl. create table hbase_lineitem( *l_orderkey bigint*, l_partkey bigint, l_suppkey int, l_linenumber bigint, l_quantity double, l_extendedprice double, l_discount double, l_tax double, l_returnflag string, l_linestatus string, l_shipdate string, l_commitdate string, l_receiptdate string, l_shipinstruct string, l_shipmode string, l_comment string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping* = :key*,l_partkey:val,l_suppkey:val, l_linenumber:val, l_quantity:val, l_extendedprice:val, l_discount:val, l_tax:val, l_returnflag:val, l_linestatus:val, l_shipdate:val, l_commitdate:val, l_receiptdate:val, l_shipinstruct:val, l_shipmode:val, l_comment:val ) TBLPROPERTIES (hbase.table.name = lineitem); insert overwrite table hbase_lineitem select * from lineitem; Demai
Re: error: Failed to create spark client. for hive on spark
yes, have placed spark-assembly jar in hive lib folder. hive.log--- bmit.2317151720491931059.properties --class org.apache.hive.spark.client.RemoteDriver /opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-exec-1.2.0-SNAPSHOT.jar --remote-host M151 --remote-port 56996 --conf hive.spark.client.connect.timeout=1 --conf hive.spark.client.server.connect.timeout=9 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 2015-03-02 20:33:39,893 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=9 2015-03-02 20:33:40,002 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - 15/03/02 20:33:40 INFO client.RemoteDriver: Connecting to: M151:56996 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Exception in thread main java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.rpc.RpcConfiguration.clinit(RpcConfiguration.java:46) 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.RemoteDriver.init(RemoteDriver.java:139) 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:544) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at java.lang.reflect.Method.invoke(Method.java:601) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2015-03-02 20:33:40,410 WARN [Driver]: client.SparkClientImpl (SparkClientImpl.java:run(411)) - Child process exited with code 1. 2015-03-02 20:35:08,950 WARN [main]: client.SparkClientImpl (SparkClientImpl.java:init(98)) - Error while waiting for client to connect. java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:96) at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80) at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88) at
RE: PL/HQL - Procedural SQL-on-Hadoop
Is there a simple way to migrate from PL/SQL to PL/HQL? Regards, Venkat From: Dmitry Tolpeko [mailto:dmtolp...@gmail.com] Sent: Friday, February 27, 2015 1:36 PM To: user@hive.apache.org Subject: PL/HQL - Procedural SQL-on-Hadoop Let me introduce PL/HQL, an open source tool that implements procedural SQL on Hadoop. It is going to support all major procedural syntaxes. The tool can be used with any SQL-on-Hadoop solution. Motivation: * Writing the driver code using well-known procedural SQL (not bash) that enables Hadoop to even more wider audience * Allowing dynamic SQL, iterations, flow-of-control and SQL exception handling * Facilitating migration of RDBMS workload to Hadoop Plans (besides extending syntax): * Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code * Allowing connections to multiple databases (i.e. lookup tables in relational databases) * On-the-fly SQL conversion (SELECT i.e.), compatibility layers More details can be found at http://www.plhql.org/ It is just the first release PL/HQL 0.01 to show that such project exists and get any initial feedback. Thank you, Dmitry Tolpeko This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.
Any tutorial document about how to use the example data
Hi, I notice there's one folder example which contains sample data and sample queries. But I didn't find any document about how to use these data and queries. Could anyone point it to me ? Thanks
Re: error: Failed to create spark client. for hive on spark
It seems that the remote spark context failed to come up. I saw you're using Spark standalone cluster. Please make sure spark cluster is up. You may try spark.master=local first. On Mon, Mar 2, 2015 at 5:15 PM, scwf wangf...@huawei.com wrote: yes, have placed spark-assembly jar in hive lib folder. hive.log--- bmit.2317151720491931059.properties --class org.apache.hive.spark.client.RemoteDriver /opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-exec-1.2.0-SNAPSHOT.jar --remote-host M151 --remote-port 56996 --conf hive.spark.client.connect.timeout=1 --conf hive.spark.client.server.connect.timeout=9 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 2015-03-02 20:33:39,893 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 2015-03-02 20:33:39,894 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=9 2015-03-02 20:33:40,002 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - 15/03/02 20:33:40 INFO client.RemoteDriver: Connecting to: M151:56996 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) - Exception in thread main java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client. rpc.RpcConfiguration.clinit(RpcConfiguration.java:46) 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client. RemoteDriver.init(RemoteDriver.java:139) 2015-03-02 20:33:40,005 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client. RemoteDriver.main(RemoteDriver.java:544) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect. NativeMethodAccessorImpl.invoke0(Native Method) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect. NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at sun.reflect. DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at java.lang.reflect.Method. invoke(Method.java:601) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy. SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain( SparkSubmit.scala:569) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy. SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy. SparkSubmit$.submit(SparkSubmit.scala:189) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy. SparkSubmit$.main(SparkSubmit.scala:110) 2015-03-02 20:33:40,006 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy. SparkSubmit.main(SparkSubmit.scala) 2015-03-02 20:33:40,410 WARN [Driver]: client.SparkClientImpl (SparkClientImpl.java:run(411)) - Child process exited with code 1. 2015-03-02 20:35:08,950 WARN [main]: client.SparkClientImpl (SparkClientImpl.java:init(98)) - Error while waiting for client to connect. java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. at io.netty.util.concurrent.AbstractFuture.get( AbstractFuture.java:37) at org.apache.hive.spark.client.SparkClientImpl.init(
Re: PL/HQL - Procedural SQL-on-Hadoop
Venkat, The goal of this project is to execute existing PL/SQL in Hive as much as possible, not to migrate. In case when some design restrictions are faced the code has to be redesigned, but hopefully most of the remaining code remained untouched, no need to convert everything to bash/Python etc. Dmitry On Tue, Mar 3, 2015 at 4:39 AM, Venkat, Ankam ankam.ven...@centurylink.com wrote: Is there a simple way to migrate from PL/SQL to PL/HQL? Regards, Venkat *From:* Dmitry Tolpeko [mailto:dmtolp...@gmail.com] *Sent:* Friday, February 27, 2015 1:36 PM *To:* user@hive.apache.org *Subject:* PL/HQL - Procedural SQL-on-Hadoop Let me introduce PL/HQL, an open source tool that implements procedural SQL on Hadoop. It is going to support all major procedural syntaxes. The tool can be used with any SQL-on-Hadoop solution. Motivation: - Writing the driver code using well-known procedural SQL (not bash) that enables Hadoop to even more wider audience - Allowing dynamic SQL, iterations, flow-of-control and SQL exception handling - Facilitating migration of RDBMS workload to Hadoop Plans (besides extending syntax): - Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code - Allowing connections to multiple databases (i.e. lookup tables in relational databases) - On-the-fly SQL conversion (SELECT i.e.), compatibility layers More details can be found at http://www.plhql.org/ It is just the first release PL/HQL 0.01 to show that such project exists and get any initial feedback. Thank you, Dmitry Tolpeko This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.
Re: error: Failed to create spark client. for hive on spark
Could you check your hive.log and spark.log for more detailed error message? Quick check though, do you have spark-assembly.jar in your hive lib folder? Thanks, Xuefu On Mon, Mar 2, 2015 at 5:14 AM, scwf wangf...@huawei.com wrote: Hi all, anyone met this error: HiveException(Failed to create spark client.) M151:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin # bin/hive Logging initialized using configuration in jar:file:/opt/cluster/apache- hive-1.2.0-SNAPSHOT-bin/lib/hive-common-1.2.0-SNAPSHOT. jar!/hive-log4j.properties [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history hive set spark.home=/opt/cluster/spark-1.3.0-bin-hadoop2-without-hive; hive set hive.execution.engine=spark; hive set spark.master=spark://9.91.8.151:7070; hive select count(1) from src; Query ID = root_2015030220_4bed4c2a-b9a5-4d99-a485-67570e2712b7 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql. exec.spark.SparkTask thanks
Re: Where does hive do sampling in order by ?
there is no sampling for order by in Hive. Hive uses a single reducer for order by (if you're talking about MR execution engine). Hive on Spark is different for this, thought. Thanks, Xuefu On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang zjf...@gmail.com wrote: Order by usually invoke 2 steps (sampling job and repartition job) but hive only run one mr job for order by, so wondering when and where does hive do sampling ? client side ? -- Best Regards Jeff Zhang
Where does hive do sampling in order by ?
Order by usually invoke 2 steps (sampling job and repartition job) but hive only run one mr job for order by, so wondering when and where does hive do sampling ? client side ? -- Best Regards Jeff Zhang
error: Failed to create spark client. for hive on spark
Hi all, anyone met this error: HiveException(Failed to create spark client.) M151:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin # bin/hive Logging initialized using configuration in jar:file:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-common-1.2.0-SNAPSHOT.jar!/hive-log4j.properties [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: backward-delete-word [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history [INFO] Unable to bind key for unsupported operation: up-history [INFO] Unable to bind key for unsupported operation: down-history hive set spark.home=/opt/cluster/spark-1.3.0-bin-hadoop2-without-hive; hive set hive.execution.engine=spark; hive set spark.master=spark://9.91.8.151:7070; hive select count(1) from src; Query ID = root_2015030220_4bed4c2a-b9a5-4d99-a485-67570e2712b7 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask thanks
map-side join fails when a serialized table contains arrays
Hi, I got the attached error on a map-side join where a serialized table contains an array column. When setting map-side join off via setting hive.mapjoin.optimized.hashtable=false, exceptions do not occur. It seems that a wrong ObjectInspector was set at CommonJoinOperator#initializeOp. I am using Hive 1.0.0 (Tez 0.6) on Hadoop 2.6.0. I found a similar report at http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2 Is this a known issue/bug? Thanks, Makoto task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {gid:1,userid:4422,movieid:1213,rating:5} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {gid:1,userid:4422,movieid:1213,rating:5} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {gid:1,userid:4422,movieid:1213,rating:5} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: Unexpected exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast to [Ljava.lang.Object; at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:299) ... 26 more Caused
Tez query
Hello Everyone, I was able to look up the query of hive using hive.query.name from job history server. I wasn't able to find a similar parameter for tez. Is there a way where you could find out all the queries that ran in a tez session ? Thanks
Re: HS2 standalone JDBC jar not standalone
yes, we even have a ticket for that https://issues.apache.org/jira/browse/HIVE-9600 btw can anyone test jdbc driver with kerberos enabled? https://issues.apache.org/jira/browse/HIVE-9599 On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote: Heya, I've like to use jmeter against HS2/JDBC and I'm finding the standalone jar isn't actually standalone. It appears to include a number of dependencies but not Hadoop Common stuff. Is there a packaging of this jar that is actually standalone? Are there instructing for using this standalone jar as it is? Thanks, Nick jmeter.JMeter: Uncaught exception: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138) at org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108) at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214) at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more
HS2 standalone JDBC jar not standalone
Heya, I've like to use jmeter against HS2/JDBC and I'm finding the standalone jar isn't actually standalone. It appears to include a number of dependencies but not Hadoop Common stuff. Is there a packaging of this jar that is actually standalone? Are there instructing for using this standalone jar as it is? Thanks, Nick jmeter.JMeter: Uncaught exception: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138) at org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108) at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214) at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more
Re: how to access array type?
hive create table test1 (c1 arrayint) row format delimited collection items terminated by ','; OK hive insert into test1 select array(1,2,3) from dual; OK hive select * from test1; OK [1,2,3] hive select c1[0] from test1; OK 1 $ hadoop fs -cat /apps/hive/warehouse/test1/00_0 1,2,3 On Sun, Mar 1, 2015 at 11:53 PM, Jie Zhang jiezh2...@gmail.com wrote: Hi, I am trying to use hive complex data type on hive0.14.0. However, could not access the array type as manual indicated. I have an array column, but hit SemanticException when access the individual item in the array. Any clue? Did I use the wrong syntax or miss some property setting? Thanks! hive create table test1 (c1 array int) row format delimited collection items terminated by ','; OK Time taken: 0.092 seconds hive select * from test1; OK [1,2,3] Time taken: 0.065 seconds, Fetched: 1 row(s) hive select c1[0] from test1; FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or column reference 'c1[0]': (possible column names are: c1) Jessica
Conversion of one file format to another
Hello All, I have a couple of Sequence files on HDFS. I now need to load these files into an ORC table. One option is to create an external table of SequenceFile format and then load it into the ORC table by using the INSERT OVERWRITE command. I am looking for an alternative without using an intermediate table. Is there a way of achieving this by writing a Custom output format? Thanks Regards, *Varsha *
Re: HS2 standalone JDBC jar not standalone
Thanks Alexander! On Mon, Mar 2, 2015 at 10:31 AM, Alexander Pivovarov apivova...@gmail.com wrote: yes, we even have a ticket for that https://issues.apache.org/jira/browse/HIVE-9600 btw can anyone test jdbc driver with kerberos enabled? https://issues.apache.org/jira/browse/HIVE-9599 On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote: Heya, I've like to use jmeter against HS2/JDBC and I'm finding the standalone jar isn't actually standalone. It appears to include a number of dependencies but not Hadoop Common stuff. Is there a packaging of this jar that is actually standalone? Are there instructing for using this standalone jar as it is? Thanks, Nick jmeter.JMeter: Uncaught exception: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394) at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:233) at org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138) at org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235) at org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108) at org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214) at org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 13 more