[ 
https://issues.apache.org/jira/browse/SPARK-13859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-13859:
--------------------------------
    Description: 
Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 0, answer set reports 107.

Actual results:
{noformat}
[0]
{noformat}

Expected:
{noformat}
+-----+
|   1 |
+-----+
| 107 |
+-----+
{noformat}

query used:
{noformat}
-- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION
 select  count(*) from (
    select distinct c_last_name, c_first_name, d_date
    from store_sales
         JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp1
  JOIN
    (select distinct c_last_name, c_first_name, d_date
    from catalog_sales
         JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = 
tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date 
= tmp2.d_date) 
  JOIN
    (
    select distinct c_last_name, c_first_name, d_date
    from web_sales
         JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = 
tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date 
= tmp3.d_date) 
  limit 100
 ;
-- end query 38 in stream 0 using template query38.tpl

{noformat}

  was:
Testing Spark SQL using TPC queries. Query 38 returns wrong results compared to 
official result set. This is at 1GB SF (validation run).

SparkSQL returns count of 0, answer set reports 107.

Actual results:
[0]

Expected:
+-----+
|   1 |
+-----+
| 107 |
+-----+

query used:
-- start query 38 in stream 0 using template query38.tpl and seed QUALIFICATION
 select  count(*) from (
    select distinct c_last_name, c_first_name, d_date
    from store_sales
         JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp1
  JOIN
    (select distinct c_last_name, c_first_name, d_date
    from catalog_sales
         JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON catalog_sales.cs_bill_customer_sk = 
customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = 
tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and (tmp1.d_date 
= tmp2.d_date) 
  JOIN
    (
    select distinct c_last_name, c_first_name, d_date
    from web_sales
         JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
         JOIN customer ON web_sales.ws_bill_customer_sk = customer.c_customer_sk
    where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = 
tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and (tmp1.d_date 
= tmp3.d_date) 
  limit 100
 ;
-- end query 38 in stream 0 using template query38.tpl



> TPCDS query 38 returns wrong results compared to TPC official result set 
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13859
>                 URL: https://issues.apache.org/jira/browse/SPARK-13859
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: JESSE CHEN
>              Labels: tpcds-result-mismatch
>
> Testing Spark SQL using TPC queries. Query 38 returns wrong results compared 
> to official result set. This is at 1GB SF (validation run).
> SparkSQL returns count of 0, answer set reports 107.
> Actual results:
> {noformat}
> [0]
> {noformat}
> Expected:
> {noformat}
> +-----+
> |   1 |
> +-----+
> | 107 |
> +-----+
> {noformat}
> query used:
> {noformat}
> -- start query 38 in stream 0 using template query38.tpl and seed 
> QUALIFICATION
>  select  count(*) from (
>     select distinct c_last_name, c_first_name, d_date
>     from store_sales
>          JOIN date_dim ON store_sales.ss_sold_date_sk = date_dim.d_date_sk
>          JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
>     where d_month_seq between 1200 and 1200 + 11) tmp1
>   JOIN
>     (select distinct c_last_name, c_first_name, d_date
>     from catalog_sales
>          JOIN date_dim ON catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
>          JOIN customer ON catalog_sales.cs_bill_customer_sk = 
> customer.c_customer_sk
>     where d_month_seq between 1200 and 1200 + 11) tmp2 ON (tmp1.c_last_name = 
> tmp2.c_last_name) and (tmp1.c_first_name = tmp2.c_first_name) and 
> (tmp1.d_date = tmp2.d_date) 
>   JOIN
>     (
>     select distinct c_last_name, c_first_name, d_date
>     from web_sales
>          JOIN date_dim ON web_sales.ws_sold_date_sk = date_dim.d_date_sk
>          JOIN customer ON web_sales.ws_bill_customer_sk = 
> customer.c_customer_sk
>     where d_month_seq between 1200 and 1200 + 11) tmp3 ON (tmp1.c_last_name = 
> tmp3.c_last_name) and (tmp1.c_first_name = tmp3.c_first_name) and 
> (tmp1.d_date = tmp3.d_date) 
>   limit 100
>  ;
> -- end query 38 in stream 0 using template query38.tpl
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to