[jira] [Created] (HIVE-17565) NullPointerException occurs when hive.optimize.skewjoin and hive.auto.convert.join are switched on at the same time

2017-09-20 Thread Xin Hao (JIRA)
Xin Hao created HIVE-17565:
--

 Summary: NullPointerException occurs when hive.optimize.skewjoin 
and hive.auto.convert.join are switched on at the same time
 Key: HIVE-17565
 URL: https://issues.apache.org/jira/browse/HIVE-17565
 Project: Hive
  Issue Type: Bug
Reporter: Xin Hao


NullPointerException occurs when hive.optimize.skewjoin and 
hive.auto.convert.join are switched on at the same time.
Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.

Workload:
(1)TPCx-BB Q19
(2) A small case as below,which was actually simplified from Q19:

SELECT *
FROM store_returns sr,
(
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk;


Exception Error Message:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more








--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-13764) NullPointerException issue with MR engine when map-join is enabled and hive.exec.submit.local.task.via.child=true

2016-05-15 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13764:
--

 Summary: NullPointerException issue with MR engine when map-join 
is enabled and hive.exec.submit.local.task.via.child=true
 Key: HIVE-13764
 URL: https://issues.apache.org/jira/browse/HIVE-13764
 Project: Hive
  Issue Type: Bug
Reporter: Xin Hao


When executing TPCx-BB query 4 (with MR engine, map-join enabled, and 
hive.exec.submit.local.task.via.child=true), the following exception occurred:
2016-05-16T08:50:47,909 ERROR [bb6ffca1-f821-4ac2-b140-710d87c5d9d7 main]: 
mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(346)) - Exception:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInChildVM(MapredLocalTask.java:321)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.execute(MapredLocalTask.java:148)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:172) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1852) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1592) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1346) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1117) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1105) 
[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:236) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:339) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:436) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:452) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:721) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:648) 
[hive-cli-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.7.0_67]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
~[?:1.7.0_67]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.7.0_67]
at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_67]
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
[hadoop-common-2.6.0-cdh5.7.0.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
[hadoop-common-2.6.0-cdh5.7.0.jar:?]
2016-05-16T08:50:47,915 ERROR [bb6ffca1-f821-4ac2-b140-710d87c5d9d7 main]: 
ql.Driver (SessionState.java:printError(1066)) - FAILED: Execution Error, 
return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

If we disable map-join explicitly (by setting hive.auto.convert.join=false), 
the query could be passed.
If we keep map-join enabled and set 
hive.exec.submit.local.task.via.child=false, the query could be passed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13634) Hive-on-Spark performed worse than Hive-on-MR, for queries with external scripts

2016-04-27 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13634:
--

 Summary: Hive-on-Spark performed worse than Hive-on-MR, for 
queries with external scripts
 Key: HIVE-13634
 URL: https://issues.apache.org/jira/browse/HIVE-13634
 Project: Hive
  Issue Type: Bug
Reporter: Xin Hao


Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
scripts.

For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
external scripts to handle reduce tasks. We found that for these 3 queries 
Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
with external scripts’ seems a performance optimization opportunity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

2016-03-15 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13292:
--

 Summary: Different DOUBLE type precision issue between Spark and 
MR engine
 Key: HIVE-13292
 URL: https://issues.apache.org/jira/browse/HIVE-13292
 Project: Hive
  Issue Type: Bug
 Environment: Apache Hive 2.0.0
Apache Spark 1.6.0
Reporter: Xin Hao


Different DOUBLE type precision issue between Spark and MR engine.
Found when executing the TPC-H query5 with scale factor 2 (2GB data size). More 
details are as below.


(1)The MR engine output:
MOZAMBIQUE,1.0646195910990009E8
ETHIOPIA,1.0108856206629996E8
ALGERIA,9.987582690420012E7
MOROCCO,9.785484184850013E7
KENYA,9.412388077690017E7

(2)The Spark engine output:
MOZAMBIQUE,1.064619591099E8
ETHIOPIA,1.0108856206630005E8
ALGERIA,9.987582690419997E7
MOROCCO,9.785484184850003E7
KENYA,9.412388077690002E7


(3)Detail SQL used:
drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
  pid1 STRING,
  pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
'${env:RESULT_DIR}';

insert into table ${env:RESULT_TABLE}

select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'AFRICA'
and o_orderdate >= '1993-01-01'
and o_orderdate < '1994-01-01'
group by
n_name
order by
revenue desc;

(4)Similar issue also exists even after we simplified original query to a 
simpler one as below:

drop table if exists ${env:RESULT_TABLE};
create table ${env:RESULT_TABLE} (
  pid2 DOUBLE
)
row format delimited fields terminated by ',' lines terminated by '\n'
stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
'${env:RESULT_DIR}';

insert into table ${env:RESULT_TABLE}

select
sum(l_extendedprice * (1 - l_discount)) as revenue
from
lineitem
group by
l_orderkey
order by
revenue;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-03-14 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13278:
--

 Summary: Many redundant 'File not found' messages appeared in 
container log during query execution with Hive on Spark
 Key: HIVE-13278
 URL: https://issues.apache.org/jira/browse/HIVE-13278
 Project: Hive
  Issue Type: Bug
 Environment: Hive on Spark engine
Found based on :
Apache Hive 2.0.0
Apache Spark 1.6.0
Reporter: Xin Hao
Priority: Minor


Many redundant 'File not found' messages appeared in container log during query 
execution with Hive on Spark

Error message example:
16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
/tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when vect

2016-03-13 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13277:
--

 Summary: Exception "Unable to create serializer 
'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
occurred during query execution on spark engine when vectorized execution is 
switched on
 Key: HIVE-13277
 URL: https://issues.apache.org/jira/browse/HIVE-13277
 Project: Hive
  Issue Type: Bug
 Environment: Hive on Spark engine
Hive Version: Apache Hive 2.0.0
Spark Version: Apache Spark 1.6.0
Reporter: Xin Hao


Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
Found during TPCx-BB query2 execution on spark engine when vectorized execution 
is switched on:
(1) set hive.vectorized.execution.enabled=true; 
(2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
Apache Hive 2.0.0)
It's OK for spark engine when hive.vectorized.execution.enabled is switched off:
(1) set hive.vectorized.execution.enabled=false;
(2) set hive.vectorized.execution.reduce.enabled=true;

For MR engine, the query could pass and no exception occurred when vectorized 
execution is either switched on or switched off.

Detail Error Message is below:
2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 bytes
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
java.lang.RuntimeException: Failed to load plan: 
hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - Serialization trace:
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - reducer 
(org.apache.hadoop.hive.ql.plan.ReduceWork)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.sca

[jira] [Created] (HIVE-12091) HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]

2015-10-12 Thread Xin Hao (JIRA)
Xin Hao created HIVE-12091:
--

 Summary: HiveException (Failed to close AbstractFileMergeOperator) 
occurs during loading data to ORC file, when hive.merge.sparkfiles is set to 
true. [Spark Branch]
 Key: HIVE-12091
 URL: https://issues.apache.org/jira/browse/HIVE-12091
 Project: Hive
  Issue Type: Sub-task
Reporter: Xin Hao


This issue occurs when hive.merge.sparkfiles is set to true. And can be 
workaround by setting hive.merge.sparkfiles to false.
BTW, we did a local experiment to run the case with MR engine (set 
hive.merge.mapfiles=true; set hive.merge.mapredfiles=true;), it can pass.

(1)Component Version:
-- Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1  +  Hive-11473 
patch (HIVE-11473.3-spark.patch)  ---> This is to support Spark 1.5 for Hive on 
Spark
-- Spark 1.5.1

(2)Case used:
-- Big-Bench  Data Load (load data from HDFS to Hive warehouse, scored as ORC 
format). The related HiveQL:

DROP TABLE IF EXISTS customer_temporary;
CREATE EXTERNAL TABLE customer_temporary
  ( c_customer_sk bigint  --not null
  , c_customer_id string  --not null
  , c_current_cdemo_skbigint
  , c_current_hdemo_skbigint
  , c_current_addr_sk bigint
  , c_first_shipto_date_skbigint
  , c_first_sales_date_sk bigint
  , c_salutation  string
  , c_first_name  string
  , c_last_name   string
  , c_preferred_cust_flag string
  , c_birth_day   int
  , c_birth_month int
  , c_birth_year  int
  , c_birth_country   string
  , c_login   string
  , c_email_address   string
  , c_last_review_datestring
  )
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
  STORED AS TEXTFILE LOCATION '/user/root/benchmarks/bigbench_n1t/data/customer'
;

DROP TABLE IF EXISTS customer;
CREATE TABLE customer
STORED AS ORC
AS
SELECT * FROM customer_temporary
;

(3)Error/Exception Message:

15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = 
hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: 
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file 
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
 [ offset : 3 length: 10525754 row: 247500 ]
15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge 
Operator OFM
15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 
(TID 4)
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Failed to close AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at 
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at 
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:235)
at 
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:236)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandl

[jira] [Created] (HIVE-12089) HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]

2015-10-12 Thread Xin Hao (JIRA)
Xin Hao created HIVE-12089:
--

 Summary: HiveException (Failed to close AbstractFileMergeOperator) 
occurs during loading data to ORC file, when hive.merge.sparkfiles is set to 
true. [Spark Branch]
 Key: HIVE-12089
 URL: https://issues.apache.org/jira/browse/HIVE-12089
 Project: Hive
  Issue Type: Sub-task
Reporter: Xin Hao


Component Version:
(1)Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1  +  Hive-11473 
patch (HIVE-11473.3-spark.patch)  ---> This is to support Spark 1.5 for Hive on 
Spark
(2)Spark 1.5.1

Case used:
Big-Bench  Data Load (load data from HDFS to Hive warehouse, scored as ORC 
format). The related HiveQL:

DROP TABLE IF EXISTS customer_temporary;
CREATE EXTERNAL TABLE customer_temporary
  ( c_customer_sk bigint  --not null
  , c_customer_id string  --not null
  , c_current_cdemo_skbigint
  , c_current_hdemo_skbigint
  , c_current_addr_sk bigint
  , c_first_shipto_date_skbigint
  , c_first_sales_date_sk bigint
  , c_salutation  string
  , c_first_name  string
  , c_last_name   string
  , c_preferred_cust_flag string
  , c_birth_day   int
  , c_birth_month int
  , c_birth_year  int
  , c_birth_country   string
  , c_login   string
  , c_email_address   string
  , c_last_review_datestring
  )
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
  STORED AS TEXTFILE LOCATION '/user/root/benchmarks/bigbench_n1t/data/customer'
;

DROP TABLE IF EXISTS customer;
CREATE TABLE customer
STORED AS ORC
AS
SELECT * FROM customer_temporary
;

Error/Exception Message:
15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = 
hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: 
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file 
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
 [ offset : 3 length: 10525754 row: 247500 ]
15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge 
Operator OFM
15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 
(TID 4)
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Failed to close AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at 
org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
at 
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at 
org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:235)
at 
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:236)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:113)
... 15 more
Caused by: java.io.IOException: Unable to rename 
hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/_task_tmp.-ext-10001/_tmp.01_0
 to 
hdfs://bhx2:8020/user/hive/warehouse/big

[jira] [Created] (HIVE-9794) java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence

2015-02-25 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9794:
-

 Summary: java.lang.NoSuchMethodError occurs during hive query 
execution which has 'ADD FILE .jar' sentence
 Key: HIVE-9794
 URL: https://issues.apache.org/jira/browse/HIVE-9794
 Project: Hive
  Issue Type: Sub-task
Reporter: Xin Hao


We updated our code to the latest revision on Spark Branch  (i.e. 
fd0f638a8d481a9a98b34d3dd08236d6d591812f) , rebuild and deploy Hive in our 
cluster and run BigBench case again. Many cases (e.g. Q1, Q2, Q3, Q4, Q8) 
failed due to a common 'NoSuchMethodError'. The root cause sentence in these 
queries should be  ‘ADD FILE .jar’.

Detail error message:
Exception in thread "main" java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.ql.session.SessionState.add_resources(Lorg/apache/hadoop/hive/ql/session/SessionState$ResourceType;Ljava/util/List;)Ljava/util/List;
at 
org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:262)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-02-15 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-9697:
--
Description: 
We have a finding during running some Big-Bench cases:
when the same small table size threshold is used, Map Join operator will not be 
generated in Stage Plans for Hive on Spark, while will be generated for Hive on 
MR.

For example, When we run BigBench Q25, the meta info of one input ORC table is 
as below:
totalSize=1748955 (about 1.5M)
rawDataSize=123050375 (about 120M)
If we use the following parameter settings,
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=2500;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1; (100M)
Map Join will be enabled for Hive on MR mode, while will not be enabled for 
Hive on Spark.

We found that for Hive on MR, the HDFS file size for the table 
(ContentSummary.getLength(), should approximate the value of ‘totalSize’) will 
be used to compare with the threshold 100M (smaller than 100M), while for Hive 
on Spark 'rawDataSize' will be used to compare with the threshold 100M (larger 
than 100M). That's why MapJoin is not enabled for Hive on Spark for this case. 
And as a result Hive on Spark will get much lower performance data than Hive on 
MR for this case.

When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
similar performance data with Hive on MR by then.


  was:
We have a finding during running some Big-Bench cases:
when the same small table size threshold is used, Map Join operator will not be 
generated in Stage Plans for Hive on Spark, while will be generated for Hive on 
MR.

For example, When we run BigBench Q25, the meta info of one input ORC table is 
as below:
totalSize=1748955 (about 1.5M)
rawDataSize=123050375 (about 120M)
If we use the following parameter settings,
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=2500;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1; (100M)
Map Join will be enabled for Hive on MR mode, while will not be enabled for 
Hive on Spark.

We found that for Hive on MR, 'totalSize' will be used to compare with the 
threshold 100M ('totalSize' is about 1.5M and smaller than 100M), while for 
Hive on Spark 'rawDataSize' will be used to compare with the threshold 
('rawDataSize' is about 120M and larger than 100M). That's why MapJoin is not 
enabled for Hive on Spark for this case. And as a result Hive on Spark will get 
much lower performance data than Hive on MR for this case.

When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
similar performance data with Hive on MR by then.



> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> ---
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not 
> be generated in Stage Plans for Hive on Spark, while will be generated for 
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table 
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=2500;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for 
> Hive on Spark.
> We found that for Hive on MR, the HDFS file size for the table 
> (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
> will be used to compare with the threshold 100M (smaller than 100M), while 
> for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
> 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
> for this case. And as a result Hive on Spark will get much lower performance 
> data than Hive on MR for this case.
> When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
> similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-02-15 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-9697:
--
Description: 
We have a finding during running some Big-Bench cases:
when the same small table size threshold is used, Map Join operator will not be 
generated in Stage Plans for Hive on Spark, while will be generated for Hive on 
MR.

For example, When we run BigBench Q25, the meta info of one input ORC table is 
as below:
totalSize=1748955 (about 1.5M)
rawDataSize=123050375 (about 120M)
If we use the following parameter settings,
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=2500;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=1; (100M)
Map Join will be enabled for Hive on MR mode, while will not be enabled for 
Hive on Spark.

We found that for Hive on MR, 'totalSize' will be used to compare with the 
threshold 100M ('totalSize' is about 1.5M and smaller than 100M), while for 
Hive on Spark 'rawDataSize' will be used to compare with the threshold 
('rawDataSize' is about 120M and larger than 100M). That's why MapJoin is not 
enabled for Hive on Spark for this case. And as a result Hive on Spark will get 
much lower performance data than Hive on MR for this case.

When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
similar performance data with Hive on MR by then.

Summary: Hive on Spark is not as aggressive as MR on map join [Spark 
Branch]  (was: Hive on Spark is not as aggressive as MR on map join)

> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> ---
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not 
> be generated in Stage Plans for Hive on Spark, while will be generated for 
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table 
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=2500;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for 
> Hive on Spark.
> We found that for Hive on MR, 'totalSize' will be used to compare with the 
> threshold 100M ('totalSize' is about 1.5M and smaller than 100M), while for 
> Hive on Spark 'rawDataSize' will be used to compare with the threshold 
> ('rawDataSize' is about 120M and larger than 100M). That's why MapJoin is not 
> enabled for Hive on Spark for this case. And as a result Hive on Spark will 
> get much lower performance data than Hive on MR for this case.
> When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
> similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join

2015-02-15 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9697:
-

 Summary: Hive on Spark is not as aggressive as MR on map join
 Key: HIVE-9697
 URL: https://issues.apache.org/jira/browse/HIVE-9697
 Project: Hive
  Issue Type: Sub-task
Reporter: Xin Hao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-11 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9659:
-

 Summary: 'Error while trying to create table container' occurs 
during hive query case execution when hive.optimize.skewjoin set to 'true' 
[Spark Branch]
 Key: HIVE-9659
 URL: https://issues.apache.org/jira/browse/HIVE-9659
 Project: Hive
  Issue Type: Sub-task
Reporter: Xin Hao


We found that 'Error while trying to create table container'  occurs during 
Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
If hive.optimize.skewjoin set to 'false', the case could pass.

How to reproduce:
1. set hive.optimize.skewjoin=true;
2. Run BigBench case Q12 and it will fail. 
Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
will found error 'Error while trying to create table container' in the log and 
also a NullPointerException near the end of the log.

(a) Detail error message for 'Error while trying to create table container':
15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create 
table container
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create 
table container
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying 
to create table container
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
at 
org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
directory: 
hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
at 
org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
... 22 more
15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
15/02/12 01:29:49 INFO PerfLogger: 

(b) Detail error message for NullPointerException:
5/02/12 01:29:50 ERROR MapJoinOperator: Unexpected exception: null
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.setMapJoinKey(MapJoinOperator.java:227)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:271)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScan

[jira] [Commented] (HIVE-9586) Too verbose log can hurt performance, we should always check log level first

2015-02-08 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311827#comment-14311827
 ] 

Xin Hao commented on HIVE-9586:
---

Thank you Xuefu.

> Too verbose log can hurt performance, we should always check log level first
> 
>
> Key: HIVE-9586
> URL: https://issues.apache.org/jira/browse/HIVE-9586
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: spark-branch, 1.2.0
>
> Attachments: HIVE-9586.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9586) Too verbose log can hurt performance, we should always check log level first

2015-02-08 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311709#comment-14311709
 ] 

Xin Hao commented on HIVE-9586:
---

Hi, Xuefu, could you please consider also commit it to Spark Branch?  Thanks.

> Too verbose log can hurt performance, we should always check log level first
> 
>
> Key: HIVE-9586
> URL: https://issues.apache.org/jira/browse/HIVE-9586
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rui Li
>Assignee: Rui Li
> Fix For: 1.2.0
>
> Attachments: HIVE-9586.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9560) When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will result in value '0' after running 'analyze table TABLE_NAME compute statistics;'

2015-02-03 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304473#comment-14304473
 ] 

Xin Hao commented on HIVE-9560:
---

Thanks Prasanth for the comment! We can resolve it in our local test 
environment now.

> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> --
>
> Key: HIVE-9560
> URL: https://issues.apache.org/jira/browse/HIVE-9560
> Project: Hive
>  Issue Type: Bug
>Reporter: Xin Hao
>
> When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
> result in value '0' after running 'analyze table TABLE_NAME compute 
> statistics;'
> Reproduce step:
> (1) set hive.stats.collect.rawdatasize=true;
> (2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT 
> zero.
> You can find the value of 'rawDataSize' (NOT zero) by executing  'describe 
> extended TABLE_NAME;' 
> (4) Execute 'analyze table TABLE_NAME compute statistics;'
> (5) Execute  'describe extended TABLE_NAME;' again, and you will find that  
> the value of 'rawDataSize' will be changed to '0'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9560) When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will result in value '0' after running 'analyze table TABLE_NAME compute statistics;'

2015-02-02 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302891#comment-14302891
 ] 

Xin Hao commented on HIVE-9560:
---

For example, we have an ORC table named 'item'.

(a) Before running 'analyze table item compute statistics;',
the 'rawDataSize' was '884720592'.

The result of 'describe extended item':
Detailed Table Information  Table(tableName:item, dbName:bigbenchorc, 
owner:root, createTime:1421984899, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i_item_sk, type:bigint, 
comment:null), FieldSchema(name:i_item_id, type:string, comment:null), 
FieldSchema(name:i_rec_start_date, type:string, comment:null), 
FieldSchema(name:i_rec_end_date, type:string, comment:null), 
FieldSchema(name:i_item_desc, type:string, comment:null), 
FieldSchema(name:i_current_price, type:double, comment:null), 
FieldSchema(name:i_wholesale_cost, type:double, comment:null), 
FieldSchema(name:i_brand_id, type:int, comment:null), FieldSchema(name:i_brand, 
type:string, comment:null), FieldSchema(name:i_class_id, type:int, 
comment:null), FieldSchema(name:i_class, type:string, comment:null), 
FieldSchema(name:i_category_id, type:int, comment:null), 
FieldSchema(name:i_category, type:string, comment:null), 
FieldSchema(name:i_manufact_id, type:int, comment:null), 
FieldSchema(name:i_manufact, type:string, comment:null), 
FieldSchema(name:i_size, type:string, comment:null), 
FieldSchema(name:i_formulation, type:string, comment:null), 
FieldSchema(name:i_color, type:string, comment:null), FieldSchema(name:i_units, 
type:string, comment:null), FieldSchema(name:i_container, type:string, 
comment:null), FieldSchema(name:i_manager_id, type:int, comment:null), 
FieldSchema(name:i_product_name, type:string, comment:null)], 
location:hdfs://bhx1:8020/user/hive/warehouse/bigbenchorc.db/item, 
inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=4, transient_lastDdlTime=1421984899, 
COLUMN_STATS_ACCURATE=true, totalSize=83267548, numRows=563518, 
rawDataSize=884720592}, viewOriginalText:null, viewExpandedText:null, 
tableType:MANAGED_TABLE)
Time taken: 0.527 seconds, Fetched: 24 row(s)

(b)After running 'analyze table TABLE_NAME compute statistics;'
the 'rawDataSize' will be changed to '0',

The result of 'describe extended item':
Detailed Table Information  Table(tableName:item, dbName:bigbenchorc, 
owner:root, createTime:1421984899, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:i_item_sk, type:bigint, 
comment:null), FieldSchema(name:i_item_id, type:string, comment:null), 
FieldSchema(name:i_rec_start_date, type:string, comment:null), 
FieldSchema(name:i_rec_end_date, type:string, comment:null), 
FieldSchema(name:i_item_desc, type:string, comment:null), 
FieldSchema(name:i_current_price, type:double, comment:null), 
FieldSchema(name:i_wholesale_cost, type:double, comment:null), 
FieldSchema(name:i_brand_id, type:int, comment:null), FieldSchema(name:i_brand, 
type:string, comment:null), FieldSchema(name:i_class_id, type:int, 
comment:null), FieldSchema(name:i_class, type:string, comment:null), 
FieldSchema(name:i_category_id, type:int, comment:null), 
FieldSchema(name:i_category, type:string, comment:null), 
FieldSchema(name:i_manufact_id, type:int, comment:null), 
FieldSchema(name:i_manufact, type:string, comment:null), 
FieldSchema(name:i_size, type:string, comment:null), 
FieldSchema(name:i_formulation, type:string, comment:null), 
FieldSchema(name:i_color, type:string, comment:null), FieldSchema(name:i_units, 
type:string, comment:null), FieldSchema(name:i_container, type:string, 
comment:null), FieldSchema(name:i_manager_id, type:int, comment:null), 
FieldSchema(name:i_product_name, type:string, comment:null)], 
location:hdfs://bhx1:8020/user/hive/warehouse/bigbenchorc.db/item, 
inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, 
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
partitionKeys:[], parameters:{numFiles=4, transient_lastDdlTime=1421984899, 
COLUMN_STATS_ACCURATE=true, totalSize=83267548, numRows=563518, 
rawDataSize=884720592}, viewOriginalText:null,

[jira] [Created] (HIVE-9560) When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will result in value '0' after running 'analyze table TABLE_NAME compute statistics;'

2015-02-02 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9560:
-

 Summary: When hive.stats.collect.rawdatasize=true, 'rawDataSize' 
for an ORC table will result in value '0' after running 'analyze table 
TABLE_NAME compute statistics;'
 Key: HIVE-9560
 URL: https://issues.apache.org/jira/browse/HIVE-9560
 Project: Hive
  Issue Type: Bug
Reporter: Xin Hao


When hive.stats.collect.rawdatasize=true, 'rawDataSize' for an ORC table will 
result in value '0' after running 'analyze table TABLE_NAME compute statistics;'

Reproduce step:
(1) set hive.stats.collect.rawdatasize=true;
(2) Generate an ORC table in hive, and the value of its 'rawDataSize' is NOT 
zero.
You can find the value of 'rawDataSize' (NOT zero) by executing  'describe 
extended TABLE_NAME;' 
(4) Execute 'analyze table TABLE_NAME compute statistics;'
(5) Execute  'describe extended TABLE_NAME;' again, and you will find that  the 
value of 'rawDataSize' will be changed to '0'.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9409) Avoid ser/de loggers as logging framework can be incompatible on driver and workers

2015-01-29 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298093#comment-14298093
 ] 

Xin Hao commented on HIVE-9409:
---

Hi, shall we also consider to merge this patch to spark branch? Thanks.

> Avoid ser/de loggers as logging framework can be incompatible on driver and 
> workers
> ---
>
> Key: HIVE-9409
> URL: https://issues.apache.org/jira/browse/HIVE-9409
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
> Environment: CentOS6.5  
> Java version: 1.7.0_67
>Reporter: Xin Hao
>Assignee: Rui Li
> Fix For: 0.15.0
>
> Attachments: HIVE-9409.1.patch, HIVE-9409.1.patch, HIVE-9409.1.patch
>
>
> When we use current [Spark Branch] to build hive package. deploy it on our 
> cluster and execute hive queries (e.g. BigBench case Q10, Q18, Q19, Q27) by 
> default mode (i.e. just Hive on MR, not HiveOnSpark),  Error 
> 'java.lang.ClassNotFoundException: 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog' will occurs.
> For other released apache or CDH hive version(e.g. apache hive 0.14), there 
> is no this issue.
> By the way, if we use 'add jar /location/to/jcl-over-slf4j-1.7.5.jar' before 
> hive query execution, the issue will be workaround. 
> The detail diagnostic messages are as below:
> ==
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx1:8020/tmp/hive/root/4a4cbeb2-cf42-4eb7-a78a-7ecea6af2aff/hive_2015-01-17_10-45-51_360_5581900288096206774-1/-mr-10004/1c6c4667-8b81-41ed-a42e-fe099ae3379f/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
> Serialization trace:
> LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:431)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:287)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:657)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
> find cl
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
> find class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
> Serialization trace:
> LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 

[jira] [Commented] (HIVE-9425) External Function Jar files are not available for Driver when running with yarn-cluster mode [Spark Branch]

2015-01-22 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288666#comment-14288666
 ] 

Xin Hao commented on HIVE-9425:
---

Double checked with Big-Bench Q1 case (includes the hql ‘ADD FILE 
${env:BIG_BENCH_QUERIES_DIR}/Resources/bigbenchqueriesmr.jar;’), and it failed 
based on latest code on Spark Branch.

Error message in hive log:

2015-01-23 10:19:21,205 INFO  [main]: exec.Task 
(SessionState.java:printInfo(852)) -   set hive.exec.reducers.max=
2015-01-23 10:19:21,205 INFO  [main]: exec.Task 
(SessionState.java:printInfo(852)) - In order to set a constant number of 
reducers:
2015-01-23 10:19:21,206 INFO  [main]: exec.Task 
(SessionState.java:printInfo(852)) -   set mapreduce.job.reduces=
2015-01-23 10:19:21,208 INFO  [main]: log.PerfLogger 
(PerfLogger.java:PerfLogBegin(121)) - 
2015-01-23 10:19:21,278 INFO  [main]: ql.Context 
(Context.java:getMRScratchDir(328)) - New scratch dir is 
hdfs://bhx1:8020/tmp/hive/root/0357a036-8988-489b-85cf-329023a567c7/hive_2015-01-23_10-18-27_797_5566502876180681874-1
2015-01-23 10:19:21,432 WARN  [RPC-Handler-3]: rpc.RpcDispatcher 
(RpcDispatcher.java:handleError(142)) - Received error 
message:java.io.FileNotFoundException: 
/HiveOnSpark/Big-Bench/engines/hive/queries/Resources/bigbenchqueriesmr.jar (No 
such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.spark-project.guava.common.io.Files$FileByteSource.openStream(Files.java:124)
at 
org.spark-project.guava.common.io.Files$FileByteSource.openStream(Files.java:114)
at 
org.spark-project.guava.common.io.ByteSource.copyTo(ByteSource.java:202)
at org.spark-project.guava.common.io.Files.copy(Files.java:436)
at org.apache.spark.HttpFileServer.addFileToDir(HttpFileServer.scala:72)
at org.apache.spark.HttpFileServer.addFile(HttpFileServer.scala:55)
at org.apache.spark.SparkContext.addFile(SparkContext.scala:961)
at 
org.apache.spark.api.java.JavaSparkContext.addFile(JavaSparkContext.scala:646)
at 
org.apache.hive.spark.client.SparkClientImpl$AddFileJob.call(SparkClientImpl.java:553)
at 
org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(RemoteDriver.java:305)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hive.spark.client.rpc.RpcDispatcher.handleCall(RpcDispatcher.java:120)
at 
org.apache.hive.spark.client.rpc.RpcDispatcher.channelRead0(RpcDispatcher.java:79)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at 
io.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:108)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
.
2015-01-23 10:19:21,606 INFO  [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(148)) - 
=

[jira] [Commented] (HIVE-9410) ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch]

2015-01-22 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288554#comment-14288554
 ] 

Xin Hao commented on HIVE-9410:
---

Chengxiang, I used patch HIVE-9410.3-spark.patch to validate those four 
Big-Bench cases (Q10, Q18, Q19, Q27). They are passed for both Spark Standalone 
and Yarn-Client mode. Thanks.

> ClassNotFoundException occurs during hive query case execution with UDF 
> defined [Spark Branch]
> --
>
> Key: HIVE-9410
> URL: https://issues.apache.org/jira/browse/HIVE-9410
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
> Environment: CentOS 6.5
> JDK1.7
>Reporter: Xin Hao
>Assignee: Chengxiang Li
> Attachments: HIVE-9410.1-spark.patch, HIVE-9410.2-spark.patch, 
> HIVE-9410.3-spark.patch
>
>
> We have a hive query case with UDF defined (i.e. BigBench case Q10, Q18 
> etc.). It will be passed for default Hive (on MR) mode, while failed for Hive 
> On Spark mode (both Standalone and Yarn-Client). 
> Although we use 'add jar .jar;' to add the UDF jar explicitly, the issue 
> still exists. 
> BTW, if we put the UDF jar into $HIVE_HOME/lib dir, the case will be passed.
> Detail Error Message is as below (NOTE: 
> de.bankmark.bigbench.queries.q10.SentimentUDF is the UDF which contained in 
> jar bigbenchqueriesmr.jar, and we have add command like 'add jar 
> /location/to/bigbenchqueriesmr.jar;' into .sql explicitly)
> INFO  [pool-1-thread-1]: client.RemoteDriver (RemoteDriver.java:call(316)) - 
> Failed to run job 8dd120cb-1a4d-4d1c-ba31-61eac648c27d
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: de.bankmark.bigbench.queries.q10.SentimentUDF
> Serialization trace:
> genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
> conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> right (org.apache.commons.lang3.tuple.ImmutablePair)
> edgeProperties (org.apache.hadoop.hive.ql.plan.SparkWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> ...
> Caused by: java.lang.ClassNotFoundException: 
> de.bankmark.bigbench.queries.q10.SentimentUDF
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolve

[jira] [Commented] (HIVE-9410) ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch]

2015-01-22 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287130#comment-14287130
 ] 

Xin Hao commented on HIVE-9410:
---

After offline communication with Chengxiang, current patch 
HIVE-9410.2-spark.patch still has some issue and the verification will be 
conducted after new patch uploaded. Thanks.

> ClassNotFoundException occurs during hive query case execution with UDF 
> defined [Spark Branch]
> --
>
> Key: HIVE-9410
> URL: https://issues.apache.org/jira/browse/HIVE-9410
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
> Environment: CentOS 6.5
> JDK1.7
>Reporter: Xin Hao
>Assignee: Chengxiang Li
> Attachments: HIVE-9410.1-spark.patch, HIVE-9410.2-spark.patch
>
>
> We have a hive query case with UDF defined (i.e. BigBench case Q10, Q18 
> etc.). It will be passed for default Hive (on MR) mode, while failed for Hive 
> On Spark mode (both Standalone and Yarn-Client). 
> Although we use 'add jar .jar;' to add the UDF jar explicitly, the issue 
> still exists. 
> BTW, if we put the UDF jar into $HIVE_HOME/lib dir, the case will be passed.
> Detail Error Message is as below (NOTE: 
> de.bankmark.bigbench.queries.q10.SentimentUDF is the UDF which contained in 
> jar bigbenchqueriesmr.jar, and we have add command like 'add jar 
> /location/to/bigbenchqueriesmr.jar;' into .sql explicitly)
> INFO  [pool-1-thread-1]: client.RemoteDriver (RemoteDriver.java:call(316)) - 
> Failed to run job 8dd120cb-1a4d-4d1c-ba31-61eac648c27d
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: de.bankmark.bigbench.queries.q10.SentimentUDF
> Serialization trace:
> genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
> conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> right (org.apache.commons.lang3.tuple.ImmutablePair)
> edgeProperties (org.apache.hadoop.hive.ql.plan.SparkWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> ...
> Caused by: java.lang.ClassNotFoundException: 
> de.bankmark.bigbench.queries.q10.SentimentUDF
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
> ... 55 more

[jira] [Commented] (HIVE-9410) Spark branch, ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch]

2015-01-21 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285710#comment-14285710
 ] 

Xin Hao commented on HIVE-9410:
---

Chengxiang, tried to verify this patch while found that the case still failed 
(when using 'add jar /location/to/MyUDFJar' explicitly).

Detail Error Message in spark log:

2015-01-21 22:30:33,193 INFO  [pool-1-thread-1]: client.RemoteDriver 
(RemoteDriver.java:call(361)) - Failed to run job 
80a2b07e-efb7-4043-ab5a-0ab86282653c
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: de.bankmark.bigbench.queries.q10.SentimentUDF
Serialization trace:
genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:131)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
at 
org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:49)
at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:219)
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:343)
at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:312)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWor

[jira] [Commented] (HIVE-9409) Avoid ser/de loggers as logging framework can be incompatible on driver and workers

2015-01-21 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285627#comment-14285627
 ] 

Xin Hao commented on HIVE-9409:
---

Rui,the problem here can be solved with the patch, and those four Big-Bench 
cases (Q10, Q18, Q19, Q27) can pass for Hive (on MR) mode now. Thanks.

> Avoid ser/de loggers as logging framework can be incompatible on driver and 
> workers
> ---
>
> Key: HIVE-9409
> URL: https://issues.apache.org/jira/browse/HIVE-9409
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
> Environment: CentOS6.5  
> Java version: 1.7.0_67
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-9409.1.patch
>
>
> When we use current [Spark Branch] to build hive package. deploy it on our 
> cluster and execute hive queries (e.g. BigBench case Q10, Q18, Q19, Q27) by 
> default mode (i.e. just Hive on MR, not HiveOnSpark),  Error 
> 'java.lang.ClassNotFoundException: 
> org.apache.commons.logging.impl.SLF4JLocationAwareLog' will occurs.
> For other released apache or CDH hive version(e.g. apache hive 0.14), there 
> is no this issue.
> By the way, if we use 'add jar /location/to/jcl-over-slf4j-1.7.5.jar' before 
> hive query execution, the issue will be workaround. 
> The detail diagnostic messages are as below:
> ==
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx1:8020/tmp/hive/root/4a4cbeb2-cf42-4eb7-a78a-7ecea6af2aff/hive_2015-01-17_10-45-51_360_5581900288096206774-1/-mr-10004/1c6c4667-8b81-41ed-a42e-fe099ae3379f/map.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find 
> class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
> Serialization trace:
> LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:431)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:287)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:657)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
> find cl
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
> find class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
> Serialization trace:
> LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> at 
> org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:6

[jira] [Created] (HIVE-9410) Spark branch, ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch]

2015-01-18 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9410:
-

 Summary: Spark branch, ClassNotFoundException occurs during hive 
query case execution with UDF defined [Spark Branch]
 Key: HIVE-9410
 URL: https://issues.apache.org/jira/browse/HIVE-9410
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
 Environment: CentOS 6.5
JDK1.7
Reporter: Xin Hao


We have a hive query case with UDF defined (i.e. BigBench case Q10, Q18 etc.). 
It will be passed for default Hive (on MR) mode, while failed for Hive On Spark 
mode (both Standalone and Yarn-Client). 

Although we use 'add jar .jar;' to add the UDF jar explicitly, the issue 
still exists. 

BTW, if we put the UDF jar into $HIVE_HOME/lib dir, the case will be passed.

Detail Error Message is as below (NOTE: 
de.bankmark.bigbench.queries.q10.SentimentUDF is the UDF which contained in jar 
bigbenchqueriesmr.jar, and we have add command like 'add jar 
/location/to/bigbenchqueriesmr.jar;' into .sql explicitly)

INFO  [pool-1-thread-1]: client.RemoteDriver (RemoteDriver.java:call(316)) - 
Failed to run job 8dd120cb-1a4d-4d1c-ba31-61eac648c27d
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
de.bankmark.bigbench.queries.q10.SentimentUDF
Serialization trace:
genericUDTF (org.apache.hadoop.hive.ql.plan.UDTFDesc)
conf (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
right (org.apache.commons.lang3.tuple.ImmutablePair)
edgeProperties (org.apache.hadoop.hive.ql.plan.SparkWork)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
...
Caused by: java.lang.ClassNotFoundException: 
de.bankmark.bigbench.queries.q10.SentimentUDF
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
... 55 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9409) Spark branch, ClassNotFoundException: org.apache.commons.logging.impl.SLF4JLocationAwareLog occurs during some hive query case execution [Spark Branch]

2015-01-18 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-9409:
--
Description: 
When we use current [Spark Branch] to build hive package. deploy it on our 
cluster and execute hive queries (e.g. BigBench case Q10, Q18, Q19, Q27) by 
default mode (i.e. just Hive on MR, not HiveOnSpark),  Error 
'java.lang.ClassNotFoundException: 
org.apache.commons.logging.impl.SLF4JLocationAwareLog' will occurs.

For other released apache or CDH hive version(e.g. apache hive 0.14), there is 
no this issue.

By the way, if we use 'add jar /location/to/jcl-over-slf4j-1.7.5.jar' before 
hive query execution, the issue will be workaround. 

The detail diagnostic messages are as below:
==
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: 
hdfs://bhx1:8020/tmp/hive/root/4a4cbeb2-cf42-4eb7-a78a-7ecea6af2aff/hive_2015-01-17_10-45-51_360_5581900288096206774-1/-mr-10004/1c6c4667-8b81-41ed-a42e-fe099ae3379f/map.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.commons.logging.impl.SLF4JLocationAwareLog
Serialization trace:
LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:431)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:287)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:657)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
find cl
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
find class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
Serialization trace:
LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)

[jira] [Created] (HIVE-9409) Spark branch, ClassNotFoundException: org.apache.commons.logging.impl.SLF4JLocationAwareLog occurs during some hive query case execution [Spark Branch]

2015-01-18 Thread Xin Hao (JIRA)
Xin Hao created HIVE-9409:
-

 Summary: Spark branch, ClassNotFoundException: 
org.apache.commons.logging.impl.SLF4JLocationAwareLog occurs during some hive 
query case execution [Spark Branch]
 Key: HIVE-9409
 URL: https://issues.apache.org/jira/browse/HIVE-9409
 Project: Hive
  Issue Type: Sub-task
  Components: spark-branch
Affects Versions: spark-branch
 Environment: CentOS6.5  
Java version: 1.7.0_67

Reporter: Xin Hao
 Fix For: spark-branch


When we use current [Spark Branch] to build hive package. deploy it on our 
cluster and execute hive queries (e.g. BigBench case Q10, Q18, Q19, Q27),  
Error 'java.lang.ClassNotFoundException: 
org.apache.commons.logging.impl.SLF4JLocationAwareLog' will occurs.

For other released apache or CDH hive version(e.g. apache hive 0.14), there is 
no this issue.

By the way, if we use 'add jar /location/to/jcl-over-slf4j-1.7.5.jar' before 
hive query execution, the issue will be workaround. 

The detail diagnostic messages are as below:
==
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: 
hdfs://bhx1:8020/tmp/hive/root/4a4cbeb2-cf42-4eb7-a78a-7ecea6af2aff/hive_2015-01-17_10-45-51_360_5581900288096206774-1/-mr-10004/1c6c4667-8b81-41ed-a42e-fe099ae3379f/map.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: 
org.apache.commons.logging.impl.SLF4JLocationAwareLog
Serialization trace:
LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:431)
at 
org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:287)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:484)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:477)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:657)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
find cl
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to 
find class: org.apache.commons.logging.impl.SLF4JLocationAwareLog
Serialization trace:
LOG (org.apache.hadoop.hive.ql.exec.UDTFOperator)
childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read