[
https://issues.apache.org/jira/browse/TAJO-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaehwa Jung updated TAJO-748:
-----------------------------
Attachment: TAJO-748.patch
I found the cause for this bug. Only if _tajo.dist-query.join.broadcast.auto_
is true, GlobalPlanner set broad cast tables. But although the property is
false, we need to set broad cast tables. Because if we don't set it, some
SubQueries will be omitted.
> Shuffle partition numbers disagreement with join clause.
> --------------------------------------------------------
>
> Key: TAJO-748
> URL: https://issues.apache.org/jira/browse/TAJO-748
> Project: Tajo
> Issue Type: Bug
> Reporter: Jaehwa Jung
> Attachments: TAJO-748.patch
>
>
> I found that inline view doesn't run expected at multiple join as follows:
> *Environment*
> * DataSet: TPC-DS
> * tajo.dist-query.join.broadcast.auto : false
> *Case: 1*
> {code:xml}
> SELECT COUNT(*)
> FROM (
> SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27'
> ) cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk);
> {code}
> - actual result: 4163848
> - expected result: 4163848
> *Case: 2*
> {code:xml}
> select count(*)
> from item i
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
> 1.3 * avg(cs_ext_discount_amt) as
> avg_cs_ext_discount_amt
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as
> cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27')
> cs2
> GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk);
> {code}
> - actual result: 102000
> - expected result: 102000
> *Case: 3*
> {code:xml}
> SELECT COUNT(*)
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
> 1.3 * avg(cs_ext_discount_amt) as
> avg_cs_ext_discount_amt
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as
> cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27')
> cs2
> GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk)
> WHERE i.i_manufact_id = 436;
> {code}
> - actual result: 80
> - expected result: 4586
> *Case: 4*
> {code:xml}
> SELECT COUNT(*)
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
> 1.3 * avg(cs_ext_discount_amt) as
> avg_cs_ext_discount_amt
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
> cs.cs_ext_discount_amt as
> cs_ext_discount_amt
> FROM catalog_sales cs
> JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
> WHERE d.d_date between '2000-01-27' and '2000-04-27')
> cs2
> GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk)
> {code}
> - actual result: 71147
> - expected result: 4163848
> For reference, I made activated result using hive.
--
This message was sent by Atlassian JIRA
(v6.2#6252)