[ 
https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199136#comment-15199136
 ] 

Dilip Biswal commented on SPARK-13865:
--------------------------------------

[~smilegator] Quick update on this ..

This also seems related to to null safe equal issue. I just put a comment 
[spark-13859|https://issues.apache.org/jira/browse/SPARK-13859]

Here is the output of the query with expected count after doing similar 
modification.

{code}
spark-sql> select count(*)
         > from 
         >      (select distinct c_last_name as cln1, c_first_name as cfn1, 
d_date as ddate1, 1 as notnull1
         >        from store_sales
         >         JOIN date_dim ON store_sales.ss_sold_date_sk <=> 
date_dim.d_date_sk
         >         JOIN customer ON store_sales.ss_customer_sk <=> 
customer.c_customer_sk
         >        where
         >          d_month_seq between 1200 and 1200+11
         >        ) tmp1
         >        left outer join
         >       (select distinct c_last_name as cln2, c_first_name as cfn2, 
d_date as ddate2, 1 as notnull2
         >        from catalog_sales
         >         JOIN date_dim ON catalog_sales.cs_sold_date_sk <=> 
date_dim.d_date_sk
         >         JOIN customer ON catalog_sales.cs_bill_customer_sk <=> 
customer.c_customer_sk
         >        where 
         >          d_month_seq between 1200 and 1200+11
         >        ) tmp2 
         >       on (tmp1.cln1 <=> tmp2.cln2)
         >       and (tmp1.cfn1 <=> tmp2.cfn2)
         >       and (tmp1.ddate1<=> tmp2.ddate2)
         >        left outer join
         >       (select distinct c_last_name as cln3, c_first_name as cfn3 , 
d_date as ddate3, 1 as notnull3
         >        from web_sales
         >         JOIN date_dim ON web_sales.ws_sold_date_sk <=> 
date_dim.d_date_sk
         >         JOIN customer ON web_sales.ws_bill_customer_sk <=> 
customer.c_customer_sk
         >        where 
         >          d_month_seq between 1200 and 1200+11
         >        ) tmp3 
         >       on (tmp1.cln1 <=> tmp3.cln3)
         >       and (tmp1.cfn1 <=> tmp3.cfn3)
         >       and (tmp1.ddate1<=> tmp3.ddate3)
         > where  
         > notnull2 is null and notnull3 is null;
47298                                                                           
Time taken: 13.561 seconds, Fetched 1 row(s)

{code}

> TPCDS query 87 returns wrong results compared to TPC official result set 
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13865
>                 URL: https://issues.apache.org/jira/browse/SPARK-13865
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: JESSE CHEN
>              Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 87 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 47555, answer set expects 47298.
> Actual results:
> {noformat}
> [47555]
> {noformat}
> {noformat}
> Expected:
> +-------+
> |     1 |
> +-------+
> | 47298 |
> +-------+
> {noformat}
> Query used:
> {noformat}
> -- start query 87 in stream 0 using template query87.tpl and seed 
> QUALIFICATION
> select count(*) 
> from 
>      (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
> ddate1, 1 as notnull1
>        from store_sales
>         JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
>        where
>          d_month_seq between 1200 and 1200+11
>        ) tmp1
>        left outer join
>       (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
> ddate2, 1 as notnull2
>        from catalog_sales
>         JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON catalog_sales.cs_bill_customer_sk = 
> customer.c_customer_sk
>        where 
>          d_month_seq between 1200 and 1200+11
>        ) tmp2 
>       on (tmp1.cln1 = tmp2.cln2)
>       and (tmp1.cfn1 = tmp2.cfn2)
>       and (tmp1.ddate1= tmp2.ddate2)
>        left outer join
>       (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
> ddate3, 1 as notnull3
>        from web_sales
>         JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON web_sales.ws_bill_customer_sk = 
> customer.c_customer_sk
>        where 
>          d_month_seq between 1200 and 1200+11
>        ) tmp3 
>       on (tmp1.cln1 = tmp3.cln3)
>       and (tmp1.cfn1 = tmp3.cfn3)
>       and (tmp1.ddate1= tmp3.ddate3)
> where  
> notnull2 is null and notnull3 is null  
> ;
> -- end query 87 in stream 0 using template query87.tpl
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to