[ https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199136#comment-15199136 ]
Dilip Biswal commented on SPARK-13865: -------------------------------------- [~smilegator] Quick update on this .. This also seems related to to null safe equal issue. I just put a comment [spark-13859|https://issues.apache.org/jira/browse/SPARK-13859] Here is the output of the query with expected count after doing similar modification. {code} spark-sql> select count(*) > from > (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as ddate1, 1 as notnull1 > from store_sales > JOIN date_dim ON store_sales.ss_sold_date_sk <=> date_dim.d_date_sk > JOIN customer ON store_sales.ss_customer_sk <=> customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp1 > left outer join > (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as ddate2, 1 as notnull2 > from catalog_sales > JOIN date_dim ON catalog_sales.cs_sold_date_sk <=> date_dim.d_date_sk > JOIN customer ON catalog_sales.cs_bill_customer_sk <=> customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp2 > on (tmp1.cln1 <=> tmp2.cln2) > and (tmp1.cfn1 <=> tmp2.cfn2) > and (tmp1.ddate1<=> tmp2.ddate2) > left outer join > (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as ddate3, 1 as notnull3 > from web_sales > JOIN date_dim ON web_sales.ws_sold_date_sk <=> date_dim.d_date_sk > JOIN customer ON web_sales.ws_bill_customer_sk <=> customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp3 > on (tmp1.cln1 <=> tmp3.cln3) > and (tmp1.cfn1 <=> tmp3.cfn3) > and (tmp1.ddate1<=> tmp3.ddate3) > where > notnull2 is null and notnull3 is null; 47298 Time taken: 13.561 seconds, Fetched 1 row(s) {code} > TPCDS query 87 returns wrong results compared to TPC official result set > ------------------------------------------------------------------------- > > Key: SPARK-13865 > URL: https://issues.apache.org/jira/browse/SPARK-13865 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.0 > Reporter: JESSE CHEN > Labels: tpcds-result-mismatch > > Testing Spark SQL using TPC queries. Query 87 returns wrong results compared > to official result set. This is at 1GB SF (validation run). > SparkSQL returns count of 47555, answer set expects 47298. > Actual results: > {noformat} > [47555] > {noformat} > {noformat} > Expected: > +-------+ > | 1 | > +-------+ > | 47298 | > +-------+ > {noformat} > Query used: > {noformat} > -- start query 87 in stream 0 using template query87.tpl and seed > QUALIFICATION > select count(*) > from > (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as > ddate1, 1 as notnull1 > from store_sales > JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk > JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp1 > left outer join > (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as > ddate2, 1 as notnull2 > from catalog_sales > JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk > JOIN customer ON catalog_sales.cs_bill_customer_sk = > customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp2 > on (tmp1.cln1 = tmp2.cln2) > and (tmp1.cfn1 = tmp2.cfn2) > and (tmp1.ddate1= tmp2.ddate2) > left outer join > (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as > ddate3, 1 as notnull3 > from web_sales > JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk > JOIN customer ON web_sales.ws_bill_customer_sk = > customer.c_customer_sk > where > d_month_seq between 1200 and 1200+11 > ) tmp3 > on (tmp1.cln1 = tmp3.cln3) > and (tmp1.cfn1 = tmp3.cfn3) > and (tmp1.ddate1= tmp3.ddate3) > where > notnull2 is null and notnull3 is null > ; > -- end query 87 in stream 0 using template query87.tpl > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org