[ 
https://issues.apache.org/jira/browse/SPARK-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13865:
--------------------------------
    Description: 
Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 47555, answer set expects 47298.

Actual results:
{noformat}
[47555]
{noformat}

{noformat}
Expected:
+-------+
|     1 |
+-------+
| 47298 |
+-------+
{noformat}

Query used:
{noformat}
-- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION
select count(*) 
from 
     (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
ddate1, 1 as notnull1
       from store_sales
        JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
       where
         d_month_seq between 1200 and 1200+11
       ) tmp1
       left outer join
      (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
ddate2, 1 as notnull2
       from catalog_sales
        JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
       where 
         d_month_seq between 1200 and 1200+11
       ) tmp2 
      on (tmp1.cln1 = tmp2.cln2)
      and (tmp1.cfn1 = tmp2.cfn2)
      and (tmp1.ddate1= tmp2.ddate2)
       left outer join
      (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
ddate3, 1 as notnull3
       from web_sales
        JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
       where 
         d_month_seq between 1200 and 1200+11
       ) tmp3 
      on (tmp1.cln1 = tmp3.cln3)
      and (tmp1.cfn1 = tmp3.cfn3)
      and (tmp1.ddate1= tmp3.ddate3)
where  
notnull2 is null and notnull3 is null  
;
-- end query 87 in stream 0 using template query87.tpl
{noformat}



  was:
Testing Spark SQL using TPC queries. Query 87 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 47555, answer set expects 47298.

Actual results:
[47555]


Expected:
+-------+
|     1 |
+-------+
| 47298 |
+-------+

Query used:
-- start query 87 in stream 0 using template query87.tpl and seed QUALIFICATION
select count(*) 
from 
     (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
ddate1, 1 as notnull1
       from store_sales
        JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
       where
         d_month_seq between 1200 and 1200+11
       ) tmp1
       left outer join
      (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
ddate2, 1 as notnull2
       from catalog_sales
        JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
       where 
         d_month_seq between 1200 and 1200+11
       ) tmp2 
      on (tmp1.cln1 = tmp2.cln2)
      and (tmp1.cfn1 = tmp2.cfn2)
      and (tmp1.ddate1= tmp2.ddate2)
       left outer join
      (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
ddate3, 1 as notnull3
       from web_sales
        JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
        JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
       where 
         d_month_seq between 1200 and 1200+11
       ) tmp3 
      on (tmp1.cln1 = tmp3.cln3)
      and (tmp1.cfn1 = tmp3.cfn3)
      and (tmp1.ddate1= tmp3.ddate3)
where  
notnull2 is null and notnull3 is null  
;
-- end query 87 in stream 0 using template query87.tpl




> TPCDS query 87 returns wrong results compared to TPC official result set 
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13865
>                 URL: https://issues.apache.org/jira/browse/SPARK-13865
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: JESSE CHEN
>              Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 87 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 47555, answer set expects 47298.
> Actual results:
> {noformat}
> [47555]
> {noformat}
> {noformat}
> Expected:
> +-------+
> |     1 |
> +-------+
> | 47298 |
> +-------+
> {noformat}
> Query used:
> {noformat}
> -- start query 87 in stream 0 using template query87.tpl and seed 
> QUALIFICATION
> select count(*) 
> from 
>      (select distinct c_last_name as cln1, c_first_name as cfn1, d_date as 
> ddate1, 1 as notnull1
>        from store_sales
>         JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
>        where
>          d_month_seq between 1200 and 1200+11
>        ) tmp1
>        left outer join
>       (select distinct c_last_name as cln2, c_first_name as cfn2, d_date as 
> ddate2, 1 as notnull2
>        from catalog_sales
>         JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON catalog_sales.cs_bill_customer_sk = 
> customer.c_customer_sk
>        where 
>          d_month_seq between 1200 and 1200+11
>        ) tmp2 
>       on (tmp1.cln1 = tmp2.cln2)
>       and (tmp1.cfn1 = tmp2.cfn2)
>       and (tmp1.ddate1= tmp2.ddate2)
>        left outer join
>       (select distinct c_last_name as cln3, c_first_name as cfn3 , d_date as 
> ddate3, 1 as notnull3
>        from web_sales
>         JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
>         JOIN customer ON web_sales.ws_bill_customer_sk = 
> customer.c_customer_sk
>        where 
>          d_month_seq between 1200 and 1200+11
>        ) tmp3 
>       on (tmp1.cln1 = tmp3.cln3)
>       and (tmp1.cfn1 = tmp3.cfn3)
>       and (tmp1.ddate1= tmp3.ddate3)
> where  
> notnull2 is null and notnull3 is null  
> ;
> -- end query 87 in stream 0 using template query87.tpl
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to