[ 
https://issues.apache.org/jira/browse/TAJO-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973726#comment-13973726
 ] 

Jaehwa Jung commented on TAJO-748:
----------------------------------

I found that a shuffle partition number doesn't correspond with its pair 
partition number in special case as follows:
* Test query: above Case #3

* Left block
{code:xml}
=======================================================
Block Id: eb_1397788326208_0003_000006 [INTERMEDIATE]
=======================================================

[Incoming]
[q_1397788326208_0003] 1 => 6 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk 
(INT4), num=1)
[q_1397788326208_0003] 5 => 6 (type=HASH_SHUFFLE, key=tpcds100.tmp1.cs_item_sk 
(INT4), num=1)

[Outgoing]
[q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk 
(INT4), num=232)
{code}

* Right block
{code:xml}
=======================================================
Block Id: eb_1397788326208_0003_000009 [INTERMEDIATE]
=======================================================

[Incoming]
[q_1397788326208_0003] 7 => 9 (type=HASH_SHUFFLE, 
key=tpcds100.cs.cs_sold_date_sk (INT4), num=232)
[q_1397788326208_0003] 8 => 9 (type=HASH_SHUFFLE, key=tpcds100.d.d_date_sk 
(INT4), num=232)

[Outgoing]
[q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk 
(INT4), num=20)
{code}

* Join Block
{code:xml}
=======================================================
Block Id: eb_1397788326208_0003_000010 [INTERMEDIATE]
=======================================================

[Incoming]
[q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk 
(INT4), num=232)
[q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk 
(INT4), num=20)

[Outgoing]
[q_1397788326208_0003] 10 => 11 (type=HASH_SHUFFLE, key=, num=1)
{code}

As above, left block and right block calculated different partition numbers.
Thus, we can't find expected result.

> Inline view doesn't run as expected at multiple join.
> -----------------------------------------------------
>
>                 Key: TAJO-748
>                 URL: https://issues.apache.org/jira/browse/TAJO-748
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Jaehwa Jung
>
> I found that inline view doesn't run expected at multiple join as follows:
> *Environment*
> * DataSet: TPC-DS 
> * tajo.dist-query.join.broadcast.auto : false
> *Case: 1*
> {code:xml}
> SELECT COUNT(*)
> FROM (
>   SELECT cs.cs_item_sk as cs_item_sk,
>   cs.cs_ext_discount_amt as cs_ext_discount_amt
>   FROM catalog_sales cs
>   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>   WHERE d.d_date between '2000-01-27' and '2000-04-27'
> ) cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk);
> {code}
> - actual result: 4163848
> - expected result: 4163848
> *Case: 2*
> {code:xml}
> select count(*)
> from item i
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
>                           1.3 * avg(cs_ext_discount_amt) as 
> avg_cs_ext_discount_amt
>            FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                                         cs.cs_ext_discount_amt as 
> cs_ext_discount_amt
>                         FROM catalog_sales cs
>                         JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>                         WHERE d.d_date between '2000-01-27' and '2000-04-27') 
> cs2
>                         GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk);
> {code}
> - actual result: 102000
> - expected result: 102000
> *Case: 3*
> {code:xml}
> SELECT COUNT(*)
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                              cs.cs_ext_discount_amt as cs_ext_discount_amt
>              FROM catalog_sales cs
>              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
>                           1.3 * avg(cs_ext_discount_amt) as 
> avg_cs_ext_discount_amt
>            FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                                         cs.cs_ext_discount_amt as 
> cs_ext_discount_amt
>                         FROM catalog_sales cs
>                         JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>                         WHERE d.d_date between '2000-01-27' and '2000-04-27') 
> cs2
>                         GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk)
> WHERE i.i_manufact_id = 436;
> {code}
> - actual result: 80
> - expected result: 4586
> *Case: 4*
> {code:xml}
> SELECT COUNT(*)
> FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                              cs.cs_ext_discount_amt as cs_ext_discount_amt
>              FROM catalog_sales cs
>              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
> JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
> JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
>                           1.3 * avg(cs_ext_discount_amt) as 
> avg_cs_ext_discount_amt
>            FROM (SELECT cs.cs_item_sk as cs_item_sk,
>                                         cs.cs_ext_discount_amt as 
> cs_ext_discount_amt
>                         FROM catalog_sales cs
>                         JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
>                         WHERE d.d_date between '2000-01-27' and '2000-04-27') 
> cs2
>                         GROUP BY cs2.cs_item_sk) tmp1
> ON (i.i_item_sk = tmp1.cs_item_sk)
> {code}
> - actual result: 71147
> - expected result: 4163848
> For reference, I made activated result using hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to