Only one active reducer in YARN

2015-03-02 Thread Kumar V
Hi,  I just moved from MR1 to YARN (CDH 4.x to CDH 5.2).  After this, I see 
that all the loading jobs which are mostly like the following are running 
really slow.
insert overwrite table desttable partition (partname) select * from sourcetable
From what I can see, even if I set the number of reducers to 500, it runs 500 
reducers and 498 of them finishes in a minute and all of them have the 
following log entry.
Container killed by the ApplicationMaster. Container killed on request. Exit 
code is 143 Container exited with a non-zero exit code 143

Then it runs 1 or 2 reducers which do all the work.  The data for the whole 
table passes through these 1 or 2 reducers which runs forever.Also, the output 
data for each partition is merged into one huge file.  MR1 used to write 
smaller files and all the reducers used to do the work.Now it is different.I 
tried setting 
set hive.merge.mapfiles=false;set hive.merge.mapredfiles=false;
It doesn't seem to help.  
Here are my settings.  
set hive.exec.dynamic.partition.mode=nonstrict;set parquet.compression=gzip;SET 
mapred.output.compression.type=BLOCK;SET 
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;SET 
hive.exec.compress.intermediate=true;SET hive.exec.compress.output=true;SET 
hive.exec.dynamic.partition=true;set io.sort.mb=128;set 
mapred.map.child.java.opts=-Xmx4096M;set dfs.block.size=1073741824;SET 
hive.exec.reducers.bytes.per.reducer=10;set 
hive.merge.mapfiles=false;set hive.merge.mapredfiles=false;SET 
hive.merge.size.per.task=1073741824;
SET mapreduce.task.io.sort.mb=256;SET mapreduce.map.output.compress=true;SET 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET 
mapred.output.fileoutputformat.compress=true;set 
mapreduce.output.fileoutputformat.compress=true;SET 
mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET
 mapreduce.output.fileoutputformat.compress.type=BLOCK;SET 
io.seqfile.compression.type=BLOCK;SET 
mapred.map.output.compress.codec=org.apache.hadoop.io.compress.GzipCodec;SET 
mapreduce.job.reduces=512;
I tried SnappyCodec too. Results are not much different.
Please let me know if anyone has any ideas on how to handle this.
Regards,Murali.


load TPCH HBase tables through Hive

2015-03-02 Thread Demai Ni
hi, folks,

I am using the HBaseintergration feature from hive (
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) to load
TPCH tables into HBase. Hive 0.13 and HBase 0.98.6.

The load works well. However, as documented here:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-KeyUniqueness.


The key uniqueness prevents me from loading all 'lineitem' rows. As
'lineitem' table is using L_ORDERKEY, L_LINENUMBER as compound primary
key. If I only mapped to 'L_ORDERKEY as hbase key(aka, row #). Many rows
will get overwritten.

Any suggestion? someone on this list must go through this already. :-).
Thanks

BTW, here is my hive ddl.

create table hbase_lineitem( *l_orderkey bigint*, l_partkey bigint,
l_suppkey int, l_linenumber  bigint, l_quantity  double, l_extendedprice
double, l_discount  double, l_tax  double, l_returnflag  string,
l_linestatus  string, l_shipdate  string, l_commitdate  string,
l_receiptdate  string, l_shipinstruct  string, l_shipmode  string,
l_comment  string ) STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES
(hbase.columns.mapping* = :key*,l_partkey:val,l_suppkey:val,
l_linenumber:val, l_quantity:val, l_extendedprice:val, l_discount:val,
l_tax:val, l_returnflag:val, l_linestatus:val, l_shipdate:val,
l_commitdate:val, l_receiptdate:val, l_shipinstruct:val, l_shipmode:val,
l_comment:val ) TBLPROPERTIES (hbase.table.name = lineitem);


insert overwrite table hbase_lineitem select * from lineitem;

Demai


Re: error: Failed to create spark client. for hive on spark

2015-03-02 Thread scwf

yes, have placed spark-assembly jar in hive lib folder.

hive.log---
bmit.2317151720491931059.properties --class 
org.apache.hive.spark.client.RemoteDriver 
/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-exec-1.2.0-SNAPSHOT.jar 
--remote-host M151 --remote-port 56996 --conf 
hive.spark.client.connect.timeout=1 --conf 
hive.spark.client.server.connect.timeout=9 --conf 
hive.spark.client.channel.log.level=null --conf 
hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 
--conf hive.spark.client.secret.bits=256
2015-03-02 20:33:39,893 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: 
hive.spark.client.connect.timeout=1
2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: 
hive.spark.client.rpc.threads=8
2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: 
hive.spark.client.rpc.max.size=52428800
2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: 
hive.spark.client.secret.bits=256
2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config property: 
hive.spark.client.server.connect.timeout=9
2015-03-02 20:33:40,002 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - 15/03/02 20:33:40 INFO client.RemoteDriver: 
Connecting to: M151:56996
2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) - Exception in thread main 
java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT
2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.hive.spark.client.rpc.RpcConfiguration.clinit(RpcConfiguration.java:46)
2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.hive.spark.client.RemoteDriver.init(RemoteDriver.java:139)
2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:544)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
java.lang.reflect.Method.invoke(Method.java:601)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(553)) -at 
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2015-03-02 20:33:40,410 WARN  [Driver]: client.SparkClientImpl 
(SparkClientImpl.java:run(411)) - Child process exited with code 1.
2015-03-02 20:35:08,950 WARN  [main]: client.SparkClientImpl 
(SparkClientImpl.java:init(98)) - Error while waiting for client to connect.
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: 
Timed out waiting for client connection.
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
at 
org.apache.hive.spark.client.SparkClientImpl.init(SparkClientImpl.java:96)
at 
org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.init(RemoteHiveSparkClient.java:88)
at 

RE: PL/HQL - Procedural SQL-on-Hadoop

2015-03-02 Thread Venkat, Ankam
Is there a simple way to migrate from PL/SQL to PL/HQL?

Regards,
Venkat

From: Dmitry Tolpeko [mailto:dmtolp...@gmail.com]
Sent: Friday, February 27, 2015 1:36 PM
To: user@hive.apache.org
Subject: PL/HQL - Procedural SQL-on-Hadoop

Let me introduce PL/HQL, an open source tool that implements procedural SQL on 
Hadoop. It is going to support all major procedural syntaxes. The tool can be 
used with any SQL-on-Hadoop solution.

Motivation:

  *   Writing the driver code using well-known procedural SQL (not bash) that 
enables Hadoop to even more wider audience
  *   Allowing dynamic SQL, iterations, flow-of-control and SQL exception 
handling
  *   Facilitating migration of RDBMS workload to Hadoop
Plans (besides extending syntax):


  *   Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code
  *   Allowing connections to multiple databases (i.e. lookup tables in 
relational databases)
  *   On-the-fly SQL conversion (SELECT i.e.), compatibility layers
More details can be found at http://www.plhql.org/

It is just the first release PL/HQL 0.01 to show that such project exists and 
get any initial feedback.

Thank you,

Dmitry Tolpeko
This communication is the property of CenturyLink and may contain confidential 
or privileged information. Unauthorized use of this communication is strictly 
prohibited and may be unlawful. If you have received this communication in 
error, please immediately notify the sender by reply e-mail and destroy all 
copies of the communication and any attachments.


Any tutorial document about how to use the example data

2015-03-02 Thread Jeff Zhang
Hi,

I notice there's one folder example which contains sample data and sample
queries. But I didn't find any document about how to use these data and
queries. Could anyone point it to me ? Thanks


Re: error: Failed to create spark client. for hive on spark

2015-03-02 Thread Xuefu Zhang
It seems that the remote spark context failed to come up. I saw you're
using Spark standalone cluster. Please make sure spark cluster is up. You
may try spark.master=local first.

On Mon, Mar 2, 2015 at 5:15 PM, scwf wangf...@huawei.com wrote:

 yes, have placed spark-assembly jar in hive lib folder.

 hive.log---
 bmit.2317151720491931059.properties --class 
 org.apache.hive.spark.client.RemoteDriver
 /opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-exec-1.2.0-SNAPSHOT.jar
 --remote-host M151 --remote-port 56996 --conf 
 hive.spark.client.connect.timeout=1
 --conf hive.spark.client.server.connect.timeout=9 --conf
 hive.spark.client.channel.log.level=null --conf 
 hive.spark.client.rpc.max.size=52428800
 --conf hive.spark.client.rpc.threads=8 --conf
 hive.spark.client.secret.bits=256
 2015-03-02 20:33:39,893 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config
 property: hive.spark.client.connect.timeout=1
 2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config
 property: hive.spark.client.rpc.threads=8
 2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config
 property: hive.spark.client.rpc.max.size=52428800
 2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config
 property: hive.spark.client.secret.bits=256
 2015-03-02 20:33:39,894 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Warning: Ignoring non-spark config
 property: hive.spark.client.server.connect.timeout=9
 2015-03-02 20:33:40,002 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - 15/03/02 20:33:40 INFO
 client.RemoteDriver: Connecting to: M151:56996
 2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) - Exception in thread main
 java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT
 2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.
 rpc.RpcConfiguration.clinit(RpcConfiguration.java:46)
 2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.
 RemoteDriver.init(RemoteDriver.java:139)
 2015-03-02 20:33:40,005 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.hive.spark.client.
 RemoteDriver.main(RemoteDriver.java:544)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at sun.reflect.
 NativeMethodAccessorImpl.invoke0(Native Method)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at sun.reflect.
 NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at sun.reflect.
 DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at java.lang.reflect.Method.
 invoke(Method.java:601)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.
 SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(
 SparkSubmit.scala:569)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.
 SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.
 SparkSubmit$.submit(SparkSubmit.scala:189)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.
 SparkSubmit$.main(SparkSubmit.scala:110)
 2015-03-02 20:33:40,006 INFO  [stderr-redir-1]: client.SparkClientImpl
 (SparkClientImpl.java:run(553)) -at org.apache.spark.deploy.
 SparkSubmit.main(SparkSubmit.scala)
 2015-03-02 20:33:40,410 WARN  [Driver]: client.SparkClientImpl
 (SparkClientImpl.java:run(411)) - Child process exited with code 1.
 2015-03-02 20:35:08,950 WARN  [main]: client.SparkClientImpl
 (SparkClientImpl.java:init(98)) - Error while waiting for client to
 connect.
 java.util.concurrent.ExecutionException: 
 java.util.concurrent.TimeoutException:
 Timed out waiting for client connection.
 at io.netty.util.concurrent.AbstractFuture.get(
 AbstractFuture.java:37)
 at org.apache.hive.spark.client.SparkClientImpl.init(
 

Re: PL/HQL - Procedural SQL-on-Hadoop

2015-03-02 Thread Dmitry Tolpeko
Venkat,

The goal of this project is to execute existing PL/SQL in Hive as much as
possible, not to migrate. In case when some design restrictions are faced
the code has to be redesigned, but hopefully most of the remaining code
remained untouched, no need to convert everything to bash/Python etc.

Dmitry

On Tue, Mar 3, 2015 at 4:39 AM, Venkat, Ankam ankam.ven...@centurylink.com
wrote:

  Is there a simple way to migrate from PL/SQL to PL/HQL?



 Regards,

 Venkat



 *From:* Dmitry Tolpeko [mailto:dmtolp...@gmail.com]
 *Sent:* Friday, February 27, 2015 1:36 PM
 *To:* user@hive.apache.org
 *Subject:* PL/HQL - Procedural SQL-on-Hadoop



 Let me introduce PL/HQL, an open source tool that implements procedural
 SQL on Hadoop. It is going to support all major procedural syntaxes. The
 tool can be used with any SQL-on-Hadoop solution.



 Motivation:

- Writing the driver code using well-known procedural SQL (not bash)
that enables Hadoop to even more wider audience
- Allowing dynamic SQL, iterations, flow-of-control and SQL exception
handling
- Facilitating migration of RDBMS workload to Hadoop

  Plans (besides extending syntax):



- Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code
- Allowing connections to multiple databases (i.e. lookup tables in
relational databases)
- On-the-fly SQL conversion (SELECT i.e.), compatibility layers

  More details can be found at http://www.plhql.org/



 It is just the first release PL/HQL 0.01 to show that such project exists
 and get any initial feedback.



 Thank you,



 Dmitry Tolpeko
   This communication is the property of CenturyLink and may contain
 confidential or privileged information. Unauthorized use of this
 communication is strictly prohibited and may be unlawful. If you have
 received this communication in error, please immediately notify the sender
 by reply e-mail and destroy all copies of the communication and any
 attachments.



Re: error: Failed to create spark client. for hive on spark

2015-03-02 Thread Xuefu Zhang
Could you check your hive.log and spark.log for more detailed error
message? Quick check though, do you have spark-assembly.jar in your hive
lib folder?

Thanks,
Xuefu

On Mon, Mar 2, 2015 at 5:14 AM, scwf wangf...@huawei.com wrote:

 Hi all,
   anyone met this error: HiveException(Failed to create spark client.)

 M151:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin # bin/hive

 Logging initialized using configuration in jar:file:/opt/cluster/apache-
 hive-1.2.0-SNAPSHOT-bin/lib/hive-common-1.2.0-SNAPSHOT.
 jar!/hive-log4j.properties
 [INFO] Unable to bind key for unsupported operation: backward-delete-word
 [INFO] Unable to bind key for unsupported operation: backward-delete-word
 [INFO] Unable to bind key for unsupported operation: down-history
 [INFO] Unable to bind key for unsupported operation: up-history
 [INFO] Unable to bind key for unsupported operation: up-history
 [INFO] Unable to bind key for unsupported operation: down-history
 [INFO] Unable to bind key for unsupported operation: up-history
 [INFO] Unable to bind key for unsupported operation: down-history
 [INFO] Unable to bind key for unsupported operation: up-history
 [INFO] Unable to bind key for unsupported operation: down-history
 [INFO] Unable to bind key for unsupported operation: up-history
 [INFO] Unable to bind key for unsupported operation: down-history
 hive set spark.home=/opt/cluster/spark-1.3.0-bin-hadoop2-without-hive;
 hive set hive.execution.engine=spark;
 hive set spark.master=spark://9.91.8.151:7070;
 hive select count(1) from src;
 Query ID = root_2015030220_4bed4c2a-b9a5-4d99-a485-67570e2712b7
 Total jobs = 1
 Launching Job 1 out of 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=number
 Failed to execute spark task, with exception 
 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed
 to create spark client.)'
 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.
 exec.spark.SparkTask

 thanks




Re: Where does hive do sampling in order by ?

2015-03-02 Thread Xuefu Zhang
there is no sampling for order by in Hive. Hive uses a single reducer for
order by (if you're talking about MR execution engine).

Hive on Spark is different for this, thought.

Thanks,
Xuefu

On Mon, Mar 2, 2015 at 2:17 AM, Jeff Zhang zjf...@gmail.com wrote:

 Order by usually invoke 2 steps (sampling job and repartition job) but
 hive only run one mr job for order by, so wondering when and where does
 hive do sampling ? client side ?


 --
 Best Regards

 Jeff Zhang



Where does hive do sampling in order by ?

2015-03-02 Thread Jeff Zhang
Order by usually invoke 2 steps (sampling job and repartition job) but hive
only run one mr job for order by, so wondering when and where does hive do
sampling ? client side ?


-- 
Best Regards

Jeff Zhang


error: Failed to create spark client. for hive on spark

2015-03-02 Thread scwf

Hi all,
  anyone met this error: HiveException(Failed to create spark client.)

M151:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin # bin/hive

Logging initialized using configuration in 
jar:file:/opt/cluster/apache-hive-1.2.0-SNAPSHOT-bin/lib/hive-common-1.2.0-SNAPSHOT.jar!/hive-log4j.properties
[INFO] Unable to bind key for unsupported operation: backward-delete-word
[INFO] Unable to bind key for unsupported operation: backward-delete-word
[INFO] Unable to bind key for unsupported operation: down-history
[INFO] Unable to bind key for unsupported operation: up-history
[INFO] Unable to bind key for unsupported operation: up-history
[INFO] Unable to bind key for unsupported operation: down-history
[INFO] Unable to bind key for unsupported operation: up-history
[INFO] Unable to bind key for unsupported operation: down-history
[INFO] Unable to bind key for unsupported operation: up-history
[INFO] Unable to bind key for unsupported operation: down-history
[INFO] Unable to bind key for unsupported operation: up-history
[INFO] Unable to bind key for unsupported operation: down-history
hive set spark.home=/opt/cluster/spark-1.3.0-bin-hadoop2-without-hive;
hive set hive.execution.engine=spark;
hive set spark.master=spark://9.91.8.151:7070;
hive select count(1) from src;
Query ID = root_2015030220_4bed4c2a-b9a5-4d99-a485-67570e2712b7
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=number
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=number
In order to set a constant number of reducers:
  set mapreduce.job.reduces=number
Failed to execute spark task, with exception 
'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark 
client.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

thanks



map-side join fails when a serialized table contains arrays

2015-03-02 Thread Makoto Yui
Hi,

I got the attached error on a map-side join where a serialized table
contains an array column.

When setting map-side join off via setting
hive.mapjoin.optimized.hashtable=false, exceptions do not occur.

It seems that a wrong ObjectInspector was set at
CommonJoinOperator#initializeOp.

I am using Hive 1.0.0 (Tez 0.6) on Hadoop 2.6.0.

I found a similar report at
http://stackoverflow.com/questions/28606244/issues-upgrading-to-hdinsight-3-2-hive-0-14-0-tez-0-5-2


Is this a known issue/bug?

Thanks,
Makoto


task:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {gid:1,userid:4422,movieid:1213,rating:5}
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:138)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row {gid:1,userid:4422,movieid:1213,rating:5}
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:294)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:163)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row
{gid:1,userid:4422,movieid:1213,rating:5}
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
... 17 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: Unexpected exception:
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray cannot be cast
to [Ljava.lang.Object;
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 18 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected
exception: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryArray
cannot be cast to [Ljava.lang.Object;
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:311)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genAllOneUniqueJoinObject(CommonJoinOperator.java:670)
at
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:748)
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:299)
... 26 more
Caused 

Tez query

2015-03-02 Thread P lva
Hello Everyone,

I was able to look up the query of hive using hive.query.name from job
history server. I wasn't able to find a similar parameter for tez.

Is there a way where you could find out all the queries that ran in a tez
session ?

Thanks


Re: HS2 standalone JDBC jar not standalone

2015-03-02 Thread Alexander Pivovarov
yes, we even have a ticket for that
https://issues.apache.org/jira/browse/HIVE-9600

btw can anyone test jdbc driver with kerberos enabled?
https://issues.apache.org/jira/browse/HIVE-9599


On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 Heya,

 I've like to use jmeter against HS2/JDBC and I'm finding the standalone
 jar isn't actually standalone. It appears to include a number of
 dependencies but not Hadoop Common stuff. Is there a packaging of this jar
 that is actually standalone? Are there instructing for using this
 standalone jar as it is?

 Thanks,
 Nick

 jmeter.JMeter: Uncaught exception:  java.lang.NoClassDefFoundError:
 org/apache/hadoop/conf/Configuration
 at
 org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394)
 at
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188)
 at
 org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164)
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
 at java.sql.DriverManager.getConnection(DriverManager.java:571)
 at java.sql.DriverManager.getConnection(DriverManager.java:233)
 at
 org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138)
 at
 org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.conf.Configuration
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 13 more



HS2 standalone JDBC jar not standalone

2015-03-02 Thread Nick Dimiduk
Heya,

I've like to use jmeter against HS2/JDBC and I'm finding the standalone
jar isn't actually standalone. It appears to include a number of
dependencies but not Hadoop Common stuff. Is there a packaging of this jar
that is actually standalone? Are there instructing for using this
standalone jar as it is?

Thanks,
Nick

jmeter.JMeter: Uncaught exception:  java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at
org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394)
at
org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188)
at
org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at
org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138)
at
org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311)
at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235)
at
org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108)
at
org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214)
at
org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 13 more


Re: how to access array type?

2015-03-02 Thread Alexander Pivovarov
hive create table test1 (c1 arrayint) row format delimited collection
items terminated by ',';
OK

hive insert into test1 select array(1,2,3) from dual;
OK

hive select * from test1;
OK
[1,2,3]

hive select c1[0] from test1;
OK
1

$ hadoop fs -cat /apps/hive/warehouse/test1/00_0
1,2,3



On Sun, Mar 1, 2015 at 11:53 PM, Jie Zhang jiezh2...@gmail.com wrote:

 Hi,

 I am trying to use hive complex data type on hive0.14.0. However, could
 not access the array type as manual indicated. I have an array column, but
 hit SemanticException when access the individual item in the array. Any
 clue? Did I use the wrong syntax or miss some property setting? Thanks!

 hive create table test1 (c1 array int) row format delimited collection
 items terminated by ',';

 OK

 Time taken: 0.092 seconds

 hive select * from test1;

 OK

 [1,2,3]

 Time taken: 0.065 seconds, Fetched: 1 row(s)

 hive select c1[0] from test1;

 FAILED: SemanticException [Error 10004]: Line 1:7 Invalid table alias or
 column reference 'c1[0]': (possible column names are: c1)

 Jessica



Conversion of one file format to another

2015-03-02 Thread Varsha Raveendran
Hello All,

I have a couple of Sequence files on HDFS.  I now need to load these files
into an ORC table. One option is to create an external table of
SequenceFile format and then load it into the ORC table by using the INSERT
OVERWRITE command.

I am looking for an alternative without using an intermediate table. Is
there a way of achieving this by writing a Custom output format?


Thanks  Regards,
*Varsha *


Re: HS2 standalone JDBC jar not standalone

2015-03-02 Thread Nick Dimiduk
Thanks Alexander!

On Mon, Mar 2, 2015 at 10:31 AM, Alexander Pivovarov apivova...@gmail.com
wrote:

 yes, we even have a ticket for that
 https://issues.apache.org/jira/browse/HIVE-9600

 btw can anyone test jdbc driver with kerberos enabled?
 https://issues.apache.org/jira/browse/HIVE-9599


 On Mon, Mar 2, 2015 at 10:01 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 Heya,

 I've like to use jmeter against HS2/JDBC and I'm finding the standalone
 jar isn't actually standalone. It appears to include a number of
 dependencies but not Hadoop Common stuff. Is there a packaging of this jar
 that is actually standalone? Are there instructing for using this
 standalone jar as it is?

 Thanks,
 Nick

 jmeter.JMeter: Uncaught exception:  java.lang.NoClassDefFoundError:
 org/apache/hadoop/conf/Configuration
 at
 org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:394)
 at
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:188)
 at
 org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:164)
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
 at java.sql.DriverManager.getConnection(DriverManager.java:571)
 at java.sql.DriverManager.getConnection(DriverManager.java:233)
 at
 org.apache.avalon.excalibur.datasource.JdbcConnectionFactory.init(JdbcConnectionFactory.java:138)
 at
 org.apache.avalon.excalibur.datasource.ResourceLimitingJdbcDataSource.configure(ResourceLimitingJdbcDataSource.java:311)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.initPool(DataSourceElement.java:235)
 at
 org.apache.jmeter.protocol.jdbc.config.DataSourceElement.testStarted(DataSourceElement.java:108)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.notifyTestListenersOfStart(StandardJMeterEngine.java:214)
 at
 org.apache.jmeter.engine.StandardJMeterEngine.run(StandardJMeterEngine.java:336)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.conf.Configuration
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 13 more