Re: Hive on Spark throw java.lang.NullPointerException

2015-12-18 Thread Xuefu Zhang
Could you create a JIRA with repro case?

Thanks,
Xuefu

On Thu, Dec 17, 2015 at 9:21 PM, Jone Zhang  wrote:

> *My query is *
> set hive.execution.engine=spark;
> select
>
> t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
> (case when t4.cnt is null then 0 else 1 end) as is_evil
> from
> (select /*+mapjoin(t2)*/
> pcid,channel,version,ip,hour,
> (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
> t2.name as app_name,
> app_apk,
>
> app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
> from
> t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on
> (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds =
> 20151217)
> where
> t1.ds = 20151217) t3
> left outer join
> (
> select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217
>  group by pcid
> ) t4
> on t3.pcid=t4.pcid;
>
>
> *And the error log is *
> 2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer
> (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to
> map join
> 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver
> (SessionState.java:printError(966)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312)
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292)
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271)
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80)
> at
> org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58)
> at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135)
> at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112)
> at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1060)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:208)
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:160)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:447)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357)
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:795)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:767)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:704)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>
>
> *Some properties on hive-site.xml is *
> 
>hive.ignore.mapjoin.hint
>false
> 
> 
> hive.auto.convert.join
> true
> 
> 
>hive.auto.convert.join.noconditionaltask
>true
> 
>
>
> *The error relevant code is *
> long mjSize = ctx.getMjOpSizes().get(op);
> *I think it should be checked whether or not * ctx.getMjOpSizes().get(op) *is
> null.*
>
> *Of course, more strict logic need to you to decide.*
>
>
> *Thanks.*
> *Best Wishes.*
>


Hive on Spark throw java.lang.NullPointerException

2015-12-17 Thread Jone Zhang
*My query is *
set hive.execution.engine=spark;
select
t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
(case when t4.cnt is null then 0 else 1 end) as is_evil
from
(select /*+mapjoin(t2)*/
pcid,channel,version,ip,hour,
(case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
t2.name as app_name,
app_apk,
app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
from
t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on
(lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds =
20151217)
where
t1.ds = 20151217) t3
left outer join
(
select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217
 group by pcid
) t4
on t3.pcid=t4.pcid;


*And the error log is *
2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer
(SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to
map join
2015-12-18 08:10:18,686 ERROR [main]: ql.Driver
(SessionState.java:printError(966)) - FAILED: NullPointerException null
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312)
at
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292)
at
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271)
at
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80)
at
org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128)
at
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
at
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1171)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1060)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:208)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:160)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:447)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:795)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:767)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:704)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


*Some properties on hive-site.xml is *

   hive.ignore.mapjoin.hint
   false


hive.auto.convert.join
true


   hive.auto.convert.join.noconditionaltask
   true



*The error relevant code is *
long mjSize = ctx.getMjOpSizes().get(op);
*I think it should be checked whether or not * ctx.getMjOpSizes().get(op) *is
null.*

*Of course, more strict logic need to you to decide.*


*Thanks.*
*Best Wishes.*