[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066312#comment-15066312 ] JoneZhang commented on HIVE-12716: -- Thanks for you reply. We have a lot of querys to use mapjoin. We'are going to continue to use mapjoin hint before migrates from hint mapjoin to auto-convert mapjoin. > Hive on Spark map-join throw NullPointerException > - > > Key: HIVE-12716 > URL: https://issues.apache.org/jira/browse/HIVE-12716 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Xuefu Zhang > > The query is > set hive.execution.engine=spark; > select > t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num, > (case when t4.cnt is null then 0 else 1 end) as is_evil > from > (select /*+mapjoin(t2)*/ > pcid,channel,version,ip,hour, > (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id, > t2.name as app_name, > app_apk, > app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num > from > t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on > (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = > 20151217) > where > t1.ds = 20151217) t3 > left outer join > ( > select pcid,count(1) cnt from t_ed_soft_evillog_molo where ds=20151217 > group by pcid > ) t4 > on t3.pcid=t4.pcid; > Create table statements are as follows > CREATE TABLE `t_ed_soft_downloadlog_molo`( > `pcid` string, > `channel` string, > `version` string, > `ip` string, > `hour` string, > `app_id` bigint, > `app_name` string, > `app_apk` string, > `app_version` string, > `app_type` string, > `dwl_tool` string, > `dwl_status` string, > `err_type` string, > `dwl_store` string, > `dwl_maxspeed` string, > `dwl_minspeed` string, > `dwl_avgspeed` string, > `last_time` date, > `dwl_num` int) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_rd_soft_app_pkg_name`( > `app_id` bigint, > `cp_id` bigint, > `cat_id` bigint, > `package_id` string, > `name` string) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_ed_soft_evillog_molo`( > `imp_date` string, > `uin` string, > `pcid` string, > `appid` string, > `domain` string, > `action_type` string, > `via` string) > PARTITIONED BY ( > `ds` bigint); > And the error log is > 2015-12-18 08:10:18,685 INFO [main]: spark.SparkMapJoinOptimizer > (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to > map join > 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver > (SessionState.java:printError(966)) - FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123) > at org.
[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066211#comment-15066211 ] Nemon Lou commented on HIVE-12716: -- This issue only related to mapjoin hint. You can solve this by disable mapjoin hint (set hive.ignore.mapjoin.hint = true). Since hive.auto.convert.join is on by default ,I think it's unnecessary to use this hint. Mapjoin hint is a legacy feature and has been disabled by HIVE-4042 . And hive on Tez doesn't support it , too. See HIVE-5778 > Hive on Spark map-join throw NullPointerException > - > > Key: HIVE-12716 > URL: https://issues.apache.org/jira/browse/HIVE-12716 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Xuefu Zhang > > The query is > set hive.execution.engine=spark; > select > t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num, > (case when t4.cnt is null then 0 else 1 end) as is_evil > from > (select /*+mapjoin(t2)*/ > pcid,channel,version,ip,hour, > (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id, > t2.name as app_name, > app_apk, > app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num > from > t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on > (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = > 20151217) > where > t1.ds = 20151217) t3 > left outer join > ( > select pcid,count(1) cnt from t_ed_soft_evillog_molo where ds=20151217 > group by pcid > ) t4 > on t3.pcid=t4.pcid; > Create table statements are as follows > CREATE TABLE `t_ed_soft_downloadlog_molo`( > `pcid` string, > `channel` string, > `version` string, > `ip` string, > `hour` string, > `app_id` bigint, > `app_name` string, > `app_apk` string, > `app_version` string, > `app_type` string, > `dwl_tool` string, > `dwl_status` string, > `err_type` string, > `dwl_store` string, > `dwl_maxspeed` string, > `dwl_minspeed` string, > `dwl_avgspeed` string, > `last_time` date, > `dwl_num` int) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_rd_soft_app_pkg_name`( > `app_id` bigint, > `cp_id` bigint, > `cat_id` bigint, > `package_id` string, > `name` string) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_ed_soft_evillog_molo`( > `imp_date` string, > `uin` string, > `pcid` string, > `appid` string, > `domain` string, > `action_type` string, > `via` string) > PARTITIONED BY ( > `ds` bigint); > And the error log is > 2015-12-18 08:10:18,685 INFO [main]: spark.SparkMapJoinOptimizer > (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to > map join > 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver > (SessionState.java:printError(966)) - FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80) > at > org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java
[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066028#comment-15066028 ] JoneZhang commented on HIVE-12716: -- The following SQL will break out this error also. Query1: select case when b.platform is not null then b.platform else 'other' end version_type ,t.bookid ,count(*) as download_user from ( select /*+ MAPJOIN(a) */ version_type ,b.qq_no ,b.bookid from ( select * from u_wsd.t_rd_qqbook_newbookid ) a join ( select t1.version_type ,t1.qq_no ,t1.bookid from ( select software_version version_type ,qq_num as qq_no ,book_id as bookid from u_wsd.t_od_qqreader_beat2_download where ds=20151220 and download_code='downloadBook' and software_version <> 'UNKNOWN' and qq_num>=1 group by software_version,qq_num,book_id )t1 left outer join ( select qq_no --统计日黑名单用户 from u_wsd.t_od_qqbook_blacklist where ds=20151220 )t2 on t1.qq_no =t2.qq_no where t2.qq_no is null )b on b.bookid=a.bid )t left outer join (select distinct platform,version,subversion from u_wsd.t_rd_qqreader_beat2_version ) b on t.version_type=b.subversion group by t.bookid,b.platform; Query2: select /*+ MAPJOIN(a) */ b.qq_no ,b.bookid ,b.income from ( select * from u_wsd.t_rd_qqbook_newbookid ) a join ( select t1.bookid ,t1.qq_no ,t1.income from ( select bookid ,qq_no ,sum(payprice) income from ( select book_id bookid ,qq_no ,sum(pay_price) payprice --元宝支付金额 from u_wsd.t_od_qqbook_yb_account where ds=20151220 and src_type='WAP' group by book_id,qq_no union all select bookid ,qq_num as qq_no ,sum(price) payprice --Q币支付金额 from u_wsd.t_od_qqbook_bookpaylog where ds=20151220 and regexp_replace(substr(paytime,0,10),'-','')='20151220' and paytype='qb' group by bookid,qq_num )t group by bookid,qq_no )t1 left outer join ( select qq_no --统计日黑名单用户 from u_wsd.t_od_qqbook_blacklist where ds=20151220 )t2 on t1.qq_no =t2.qq_no where t2.qq_no is null )b on a.bid=b.bookid; I hope you can find out some of the reason. > Hive on Spark map-join throw NullPointerException > - > > Key: HIVE-12716 > URL: https://issues.apache.org/jira/browse/HIVE-12716 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.1, 1.2.1 >Reporter: JoneZhang >Assignee: Xuefu Zhang > > The query is > set hive.execution.engine=spark; > select > t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num, > (case when t4.cnt is null then 0 else 1 end) as is_evil > from > (select /*+mapjoin(t2)*/ > pcid,channel,version,ip,hour, > (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id, > t2.name as app_name, > app_apk, > app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num > from > t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on > (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = > 20151217) > where > t1.ds = 20151217) t3 > left outer join > ( > select pcid,count(1) cnt from t_ed_soft_evillog_molo where ds=20151217 > group by pcid > ) t4 > on t3.pcid=t4.pcid; > Create table statements are as follows > CREATE TABLE `t_ed_soft_downloadlog_molo`( > `pcid` string, > `channel` string, > `version` string, > `ip` string, > `hour` string, > `app_id` bigint, > `app_name` string, > `app_apk` string, > `app_version` string, > `app_type` string, > `dwl_tool` string, > `dwl_status` string, > `err_type` string, > `dwl_store` string, > `dwl_maxspeed` string, > `dwl_minspeed` string, > `dwl_avgspeed` string, > `last_time` date, > `dwl_num` int) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_rd_soft_app_pkg_name`( > `app_id` bigint, > `cp_id` bigint, > `cat_id` bigint, > `package_id` string, > `name` string) > PARTITIONED BY ( > `ds` bigint); > CREATE TABLE `t_ed_soft_evillog_molo`( > `imp_date` string, > `uin` string, > `pcid` string, > `appid` string, > `domain` string, > `action_type` string, > `via` string) > PARTITIONED BY ( > `ds` bigint); > And the error log is > 2015-12-18 08:10:18,685 INFO [main]: spark.SparkMapJoinOptimizer > (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to > map join > 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver > (SessionState.java:printError(966)) - FAILED: NullPointerException null > java.lang.NullPointerException >