[ https://issues.apache.org/jira/browse/SPARK-30349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-30349. ---------------------------------- Resolution: Cannot Reproduce Resolving this due to no feedback from the author. > The result is wrong when joining tables with selecting the same columns > ----------------------------------------------------------------------- > > Key: SPARK-30349 > URL: https://issues.apache.org/jira/browse/SPARK-30349 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4 > Reporter: cen yuhai > Priority: Blocker > Labels: correctness > Attachments: screenshot-1.png, screenshot-2.png > > > {code:sql} > // code placeholder > with tmp as( > select > log_date, > buvid, > manga_id, > sum(readtime) readtime > from > manga.dwd_app_readtime_xt_dt > where > log_date >= 20191220 > group by > log_date, > buvid, > manga_id > ) > select > t.log_date, > GET_JSON_OBJECT(t.extended_fields, '$.type'), > count(distinct t.buvid), > count(distinct t0.buvid), > count(distinct t1.buvid), > count(distinct t2.buvid), > count( > distinct case > when t1.buvid = t0.buvid then t1.buvid > end > ), > count( > distinct case > when t1.buvid = t0.buvid > and t1.buvid = t2.buvid then t1.buvid > end > ), > count( > distinct case > when t0.buvid = t2.buvid then t0.buvid > end > ), > sum(readtime), > avg(readtime), > sum( > case > when t0.buvid = t3.buvid then readtime > end > ), > avg( > case > when t0.buvid = t3.buvid then readtime > end > ) > from > manga.manga_tfc_app_ubt_d t > join manga.manga_tfc_app_ubt_d t1 on t.buvid = t1.buvid > and t1.log_date >= 20191220 > and t1.event_id = 'manga.manga-detail.0.0.pv' > and to_date(t.stime) = TO_DATE(t1.stime) > and GET_JSON_OBJECT(t1.extended_fields, '$.manga_id') = > GET_JSON_OBJECT(t.extended_fields, '$.manga_id') > left join manga.manga_buvid_minlog t0 on t.buvid = t0.buvid > and t0.log_date = 20191223 > and t0.minlog >= '2019-12-20' > and to_date(t.stime) = TO_DATE(t0.minlog) > left join manga.dwb_tfc_app_launch_df t2 on t.buvid = t2.buvid > and t2.log_date >= 20191220 > and DATE_ADD(to_date(t.stime), 1) = to_date(t2.stime) > left join tmp t3 on t1.buvid = t3.buvid > and t3.log_date >= 20191220 > and t3.manga_id = GET_JSON_OBJECT(t.extended_fields, '$.manga_id') > where > t.log_date >= 20191220 > and t.event_id = 'manga.homepage-recommend.detail.0.click' > group by > t.log_date, > GET_JSON_OBJECT(t.extended_fields, '$.type') > {code} > !screenshot-1.png! > The result of hive 2.3 is ok > !screenshot-2.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org