[ 
https://issues.apache.org/jira/browse/HIVE-26653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647014#comment-17647014
 ] 

Hankó Gergely commented on HIVE-26653:
--------------------------------------

I found an issue during investigation. It's probably closely related but I 
wasn't sure so I created a new ticket for it (HIVE-26846) and described the 
details there.

> Wrong results when (map) joining multiple tables on partition column
> --------------------------------------------------------------------
>
>                 Key: HIVE-26653
>                 URL: https://issues.apache.org/jira/browse/HIVE-26653
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>         Attachments: hive_26653.q, hive_26653_explain.txt, 
> hive_26653_explain_cbo.txt, table_a.csv, table_b.csv
>
>
> The result of the query must have exactly one row matching the date specified 
> in the WHERE clause but the query returns nothing.
> {code:sql}
> CREATE TABLE table_a (`aid` string ) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tbla.csv' into TABLE table_a;
> CREATE TABLE table_b (`bid` string) PARTITIONED BY (`p_dt` string)
> row format delimited fields terminated by ',' stored as textfile;
> LOAD DATA LOCAL INPATH '../../data/files/_tblb.csv' into TABLE table_b;
> set hive.auto.convert.join=true;
> set hive.optimize.semijoin.conversion=false;
> SELECT a.p_dt
> FROM ((SELECT p_dt
>        FROM table_b
>        GROUP BY p_dt) a
>          JOIN
>      (SELECT p_dt
>       FROM table_a
>       GROUP BY p_dt) b ON a.p_dt = b.p_dt
>          JOIN
>      (SELECT p_dt
>       FROM table_a
>       GROUP BY p_dt) c ON a.p_dt = c.p_dt)
> WHERE a.p_dt =  translate(cast(to_date(date_sub('2022-08-01', 1)) AS string), 
> '-', '');
> {code}
> +Expected result+
> 20220731
> +Actual result+
> Empty
> To reproduce the problem the tables need to have some data. Values in aid and 
> bid columns are not important. For p_dt column use one of the following 
> values 20220731, 20220630.
> I will attach some sample data with which the problem can be reproduced. The 
> tables look like below.
> ||aid|pdt||
> |611|20220731|
> |239|20220630|
> |...|...|
> The problem can be reproduced via qtest in current master 
> (commit 
> [6b05d64ce8c7161415d97a7896ea50025322e30a|https://github.com/apache/hive/commit/6b05d64ce8c7161415d97a7896ea50025322e30a])
>  by running the TestMiniLlapLocalCliDriver.
> There is specific query plan (will attach shortly) for which the problem 
> shows up so if the plan changes slightly the problem may not appear anymore; 
> this is why we need to set explicitly hive.optimize.semijoin.conversion and 
> hive.auto.convert.join to trigger the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to