[ 
https://issues.apache.org/jira/browse/TAJO-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaehwa Jung reassigned TAJO-750:
--------------------------------

    Assignee: Jaehwa Jung

> Join orders affects abnormal to the result data.
> ------------------------------------------------
>
>                 Key: TAJO-750
>                 URL: https://issues.apache.org/jira/browse/TAJO-750
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>
> I found that join orders  affects abnormal to the result data  as follows:
> *Environment*
> * DataSet: TPC-DS
> * tajo.dist-query.join.broadcast.auto: false
> *Case: 1*
> {code:xml}
> SELECT COUNT(*)
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                              cs.cs_ext_discount_amt as cs_ext_discount_amt
>              FROM catalog_sales cs
>              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
>                           1.3 * avg(cs_ext_discount_amt) as 
> avg_cs_ext_discount_amt
>            FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                                         cs.cs_ext_discount_amt as 
> cs_ext_discount_amt
>                         FROM catalog_sales cs
>                         JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>                         WHERE d.d_date between '2000-01-27' and '2000-04-27') 
> cs2
>                         GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk)
> {code}
> - actual result: 71147
> - expected result: 4163848
> *Case 2*
> {code:xml}
> SELECT COUNT(*)
> FROM item i
> JOIN (SELECT cs.cs_item_sk as cs_item_sk,
>                              cs.cs_ext_discount_amt as cs_ext_discount_amt
>              FROM catalog_sales cs
>              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1 ON 
> (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
>                           1.3 * avg(cs_ext_discount_amt) as 
> avg_cs_ext_discount_amt
>            FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                                         cs.cs_ext_discount_amt as 
> cs_ext_discount_amt
>                         FROM catalog_sales cs
>                         JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>                         WHERE d.d_date between '2000-01-27' and '2000-04-27') 
> cs2
>                         GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk);
> {code}
> - actual result: 23890
> - expected result: 4163848
> As you knew, two queries doesn't show the expected result. Furthermore, each 
> result have differences. For reference, I made activated result using hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to