[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException

2015-12-21 Thread JoneZhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066312#comment-15066312
 ] 

JoneZhang commented on HIVE-12716:
--

Thanks for you reply.
We have a lot of querys to use mapjoin.
We'are going to continue to use mapjoin hint before migrates from hint mapjoin 
to auto-convert mapjoin.

> Hive on Spark map-join throw NullPointerException
> -
>
> Key: HIVE-12716
> URL: https://issues.apache.org/jira/browse/HIVE-12716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> The query is 
> set hive.execution.engine=spark;
> select
> t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
> (case when t4.cnt is null then 0 else 1 end) as is_evil
> from
> (select /*+mapjoin(t2)*/
> pcid,channel,version,ip,hour,
> (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
> t2.name as app_name,
> app_apk,
> app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
> from
> t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on 
> (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 
> 20151217)
> where
> t1.ds = 20151217) t3
> left outer join
> (
> select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217  
> group by pcid
> ) t4
> on t3.pcid=t4.pcid;
> Create table statements are as follows 
> CREATE TABLE `t_ed_soft_downloadlog_molo`(
>   `pcid` string, 
>   `channel` string, 
>   `version` string, 
>   `ip` string, 
>   `hour` string, 
>   `app_id` bigint, 
>   `app_name` string, 
>   `app_apk` string, 
>   `app_version` string, 
>   `app_type` string, 
>   `dwl_tool` string, 
>   `dwl_status` string, 
>   `err_type` string, 
>   `dwl_store` string, 
>   `dwl_maxspeed` string, 
>   `dwl_minspeed` string, 
>   `dwl_avgspeed` string, 
>   `last_time` date, 
>   `dwl_num` int)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_rd_soft_app_pkg_name`(
>   `app_id` bigint, 
>   `cp_id` bigint, 
>   `cat_id` bigint, 
>   `package_id` string, 
>   `name` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_ed_soft_evillog_molo`(
>   `imp_date` string, 
>   `uin` string, 
>   `pcid` string, 
>   `appid` string, 
>   `domain` string, 
>   `action_type` string, 
>   `via` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> And the error log is 
> 2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer 
> (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to 
> map join
> 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver 
> (SessionState.java:printError(966)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112)
> at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123)
> at org.

[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException

2015-12-21 Thread Nemon Lou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066211#comment-15066211
 ] 

Nemon Lou commented on HIVE-12716:
--

This issue only related to mapjoin hint. You can solve this by disable mapjoin 
hint (set hive.ignore.mapjoin.hint = true).
Since hive.auto.convert.join is on by default ,I think it's unnecessary to use 
this hint.
Mapjoin hint is a legacy feature and has been disabled by HIVE-4042 .
And hive on Tez doesn't support it , too. See HIVE-5778

> Hive on Spark map-join throw NullPointerException
> -
>
> Key: HIVE-12716
> URL: https://issues.apache.org/jira/browse/HIVE-12716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> The query is 
> set hive.execution.engine=spark;
> select
> t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
> (case when t4.cnt is null then 0 else 1 end) as is_evil
> from
> (select /*+mapjoin(t2)*/
> pcid,channel,version,ip,hour,
> (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
> t2.name as app_name,
> app_apk,
> app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
> from
> t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on 
> (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 
> 20151217)
> where
> t1.ds = 20151217) t3
> left outer join
> (
> select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217  
> group by pcid
> ) t4
> on t3.pcid=t4.pcid;
> Create table statements are as follows 
> CREATE TABLE `t_ed_soft_downloadlog_molo`(
>   `pcid` string, 
>   `channel` string, 
>   `version` string, 
>   `ip` string, 
>   `hour` string, 
>   `app_id` bigint, 
>   `app_name` string, 
>   `app_apk` string, 
>   `app_version` string, 
>   `app_type` string, 
>   `dwl_tool` string, 
>   `dwl_status` string, 
>   `err_type` string, 
>   `dwl_store` string, 
>   `dwl_maxspeed` string, 
>   `dwl_minspeed` string, 
>   `dwl_avgspeed` string, 
>   `last_time` date, 
>   `dwl_num` int)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_rd_soft_app_pkg_name`(
>   `app_id` bigint, 
>   `cp_id` bigint, 
>   `cat_id` bigint, 
>   `package_id` string, 
>   `name` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_ed_soft_evillog_molo`(
>   `imp_date` string, 
>   `uin` string, 
>   `pcid` string, 
>   `appid` string, 
>   `domain` string, 
>   `action_type` string, 
>   `via` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> And the error log is 
> 2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer 
> (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to 
> map join
> 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver 
> (SessionState.java:printError(966)) - FAILED: NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80)
> at 
> org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112)
> at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128)
> at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java

[jira] [Commented] (HIVE-12716) Hive on Spark map-join throw NullPointerException

2015-12-20 Thread JoneZhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066028#comment-15066028
 ] 

JoneZhang commented on HIVE-12716:
--

The following SQL will break out this error also.
Query1:
select 
 case when  b.platform  is not null then   b.platform  else 'other' end 
version_type
 ,t.bookid 
 ,count(*) as download_user
 from 
(  
select
   /*+ MAPJOIN(a) */
version_type
,b.qq_no
,b.bookid
 from
 (
select * from u_wsd.t_rd_qqbook_newbookid  
 ) a join   
 (
 select 
  t1.version_type
  ,t1.qq_no
  ,t1.bookid
 from 
   (
   select 
software_version version_type
,qq_num as qq_no
,book_id as bookid
   from u_wsd.t_od_qqreader_beat2_download
   where ds=20151220 and download_code='downloadBook'
   and software_version <> 'UNKNOWN' and qq_num>=1
   group by software_version,qq_num,book_id 
   )t1  left outer join
   (
  select 
  qq_no  --统计日黑名单用户
  from u_wsd.t_od_qqbook_blacklist
  where ds=20151220
  )t2 on t1.qq_no =t2.qq_no
  where t2.qq_no is null  
 )b  on b.bookid=a.bid
)t left outer join
(select 
 distinct platform,version,subversion from u_wsd.t_rd_qqreader_beat2_version
) b
on t.version_type=b.subversion
group by t.bookid,b.platform;


Query2:
select
   /*+ MAPJOIN(a) */
b.qq_no
,b.bookid
,b.income
from
(
select * from u_wsd.t_rd_qqbook_newbookid  
) a  join 
(
 select 
  t1.bookid
  ,t1.qq_no
  ,t1.income
 from 
 ( select
bookid
,qq_no
,sum(payprice) income 
  from 
   (
   select 
book_id bookid
,qq_no
,sum(pay_price) payprice   --元宝支付金额
   from u_wsd.t_od_qqbook_yb_account 
   where ds=20151220
   and src_type='WAP'
   group by book_id,qq_no
   union all
   select 
bookid 
,qq_num as qq_no
,sum(price) payprice  --Q币支付金额
   from u_wsd.t_od_qqbook_bookpaylog 
   where ds=20151220
   and regexp_replace(substr(paytime,0,10),'-','')='20151220' 
   and paytype='qb'
   group by bookid,qq_num
   )t
  group by bookid,qq_no
 )t1  left outer join
 (
  select 
   qq_no  --统计日黑名单用户
  from u_wsd.t_od_qqbook_blacklist
  where ds=20151220
 )t2 on t1.qq_no =t2.qq_no
 where t2.qq_no is null 
)b on a.bid=b.bookid;


I hope you can find out some of the reason.



> Hive on Spark map-join throw NullPointerException
> -
>
> Key: HIVE-12716
> URL: https://issues.apache.org/jira/browse/HIVE-12716
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> The query is 
> set hive.execution.engine=spark;
> select
> t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
> (case when t4.cnt is null then 0 else 1 end) as is_evil
> from
> (select /*+mapjoin(t2)*/
> pcid,channel,version,ip,hour,
> (case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
> t2.name as app_name,
> app_apk,
> app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
> from
> t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on 
> (lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 
> 20151217)
> where
> t1.ds = 20151217) t3
> left outer join
> (
> select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217  
> group by pcid
> ) t4
> on t3.pcid=t4.pcid;
> Create table statements are as follows 
> CREATE TABLE `t_ed_soft_downloadlog_molo`(
>   `pcid` string, 
>   `channel` string, 
>   `version` string, 
>   `ip` string, 
>   `hour` string, 
>   `app_id` bigint, 
>   `app_name` string, 
>   `app_apk` string, 
>   `app_version` string, 
>   `app_type` string, 
>   `dwl_tool` string, 
>   `dwl_status` string, 
>   `err_type` string, 
>   `dwl_store` string, 
>   `dwl_maxspeed` string, 
>   `dwl_minspeed` string, 
>   `dwl_avgspeed` string, 
>   `last_time` date, 
>   `dwl_num` int)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_rd_soft_app_pkg_name`(
>   `app_id` bigint, 
>   `cp_id` bigint, 
>   `cat_id` bigint, 
>   `package_id` string, 
>   `name` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> CREATE TABLE `t_ed_soft_evillog_molo`(
>   `imp_date` string, 
>   `uin` string, 
>   `pcid` string, 
>   `appid` string, 
>   `domain` string, 
>   `action_type` string, 
>   `via` string)
> PARTITIONED BY ( 
>   `ds` bigint);
> And the error log is 
> 2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer 
> (SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to 
> map join
> 2015-12-18 08:10:18,686 ERROR [main]: ql.Driver 
> (SessionState.java:printError(966)) - FAILED: NullPointerException null
> java.lang.NullPointerException
>