Chao created HIVE-8216:
--------------------------
Summary: auto_smb_mapjoin_14.q failed test with exception. [Spark
Branch]
Key: HIVE-8216
URL: https://issues.apache.org/jira/browse/HIVE-8216
Project: Hive
Issue Type: Bug
Components: Spark
Reporter: Chao
While trying to enable auto_smb_mapjoin_14.q, the following query:
{code}
select count(*) from (
select a.key as key, a.value as val1, b.value as val2 from tbl1 a join tbl2 b
on a.key = b.key
) subq1;
{code}
failed with exception:
{noformat}
2014-09-22 11:42:56,157 ERROR [Executor task launch worker-2]:
spark.SparkMapRecordHandler (SparkMapRecordHandler.java:processRow(150)) -
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"key":0,"value":"val_0"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)
at
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:140)
at
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
at
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:28)
at
org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:108)
at
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.SMBMapJoinOperator.processOp(SMBMapJoinOperator.java:258)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at
org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:137)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
... 15 more
{noformat}
The query plan doesn't look correct:
{noformat}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (GROUP)
DagName: chao_20140922113636_e90b1567-df72-43f4-b9ea-15f986de35c2:11
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE
Column stats: NONE
Sorted Merge Bucket Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0
1
keys:
0 key (type: int)
1 key (type: int)
Select Operator
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Reduce Output Operator
sort order:
value expressions: _col0 (type: bigint)
Map 3
Map Operator Tree:
TableScan
alias: b
Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE
Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 5 Data size: 35 Basic stats: COMPLETE
Column stats: NONE
Sorted Merge Bucket Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0
1
keys:
0 key (type: int)
1 key (type: int)
Select Operator
Group By Operator
aggregations: count()
mode: hash
outputColumnNames: _col0
Reduce Output Operator
sort order:
value expressions: _col0 (type: bigint)
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
File Output Operator
compressed: false
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{noformat}
I think it's related to SMB Join, so this JIRA should be solved once that is
done.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)