[jira] [Created] (HIVE-13275) Add a toString method to BytesRefArrayWritable

2016-03-13 Thread Harsh J (JIRA)
Harsh J created HIVE-13275:
--

 Summary: Add a toString method to BytesRefArrayWritable
 Key: HIVE-13275
 URL: https://issues.apache.org/jira/browse/HIVE-13275
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 1.1.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Attachments: HIVE-13275.000.patch

RCFileInputFormat cannot be used externally for Hadoop Streaming today cause 
Streaming generally relies on the K/V pairs to be able to emit text 
representations (via toString()).

Since BytesRefArrayWritable has no toString() methods, the usage of the 
RCFileInputFormat causes object representation prints which are not useful.

Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an 
array), so its important to output them in a valid/parseable manner, as opposed 
to choosing a simple joining delimiter over the string representations of the 
inner elements.

I propose adding a standardised CSV formatting of the array data, such that 
users of Streaming can then parse the results in their own script. Since we 
have OpenCSV as a dependency already, we can make use of it for this purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13276) Hive on Spark doesn't work when spark.master=local

2016-03-13 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-13276:
--

 Summary: Hive on Spark doesn't work when spark.master=local
 Key: HIVE-13276
 URL: https://issues.apache.org/jira/browse/HIVE-13276
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: 2.1.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang


The following problem occurs with latest Hive master and Spark 1.6.1. I'm using 
hive CLI on mac.

{code}
  set mapreduce.job.reduces=
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.rdd.RDDOperationScope$
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.SparkContext.hadoopRDD(SparkContext.scala:991)
at 
org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:419)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateMapInput(SparkPlanGenerator.java:205)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:145)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:117)
at 
org.apache.hadoop.hive.ql.exec.spark.LocalHiveSparkClient.execute(LocalHiveSparkClient.java:130)
at 
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:71)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:94)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:156)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:101)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1837)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1578)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1351)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1122)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1110)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:778)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:717)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask. Could not initialize class 
org.apache.spark.rdd.RDDOperationScope$
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when vect

2016-03-13 Thread Xin Hao (JIRA)
Xin Hao created HIVE-13277:
--

 Summary: Exception "Unable to create serializer 
'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
occurred during query execution on spark engine when vectorized execution is 
switched on
 Key: HIVE-13277
 URL: https://issues.apache.org/jira/browse/HIVE-13277
 Project: Hive
  Issue Type: Bug
 Environment: Hive on Spark engine
Hive Version: Apache Hive 2.0.0
Spark Version: Apache Spark 1.6.0
Reporter: Xin Hao


Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
Found during TPCx-BB query2 execution on spark engine when vectorized execution 
is switched on:
(1) set hive.vectorized.execution.enabled=true; 
(2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
Apache Hive 2.0.0)
It's OK for spark engine when hive.vectorized.execution.enabled is switched off:
(1) set hive.vectorized.execution.enabled=false;
(2) set hive.vectorized.execution.reduce.enabled=true;

For MR engine, the query could pass and no exception occurred when vectorized 
execution is either switched on or switched off.

Detail Error Message is below:
2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 bytes
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
java.lang.RuntimeException: Failed to load plan: 
hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
 org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.IllegalArgumentException: Unable to create serializer 
"org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - Serialization trace:
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - childOperators 
(org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) - reducer 
(org.apache.hadoop.hive.ql.plan.ReduceWork)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(593)) -at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.sca