[ https://issues.apache.org/jira/browse/IMPALA-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872477#comment-17872477 ]
Fang-Yu Rao edited comment on IMPALA-13262 at 8/9/24 10:15 PM: --------------------------------------------------------------- I started git bisecting from [IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which is not affected by the bug) and it told us that the culprit is IMPALA-9979: part 2. In addition, setting '{*}ANALYTIC_RANK_PUSHDOWN_THRESHOLD{*}' to *0* could not work around this issue. {code:java} fangyurao@fangyu:~/Impala_for_FE$ git bisect bad b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit commit b42c64993d46893488a667fb9c425548fdf964ab Author: Tim Armstrong <tarmstr...@cloudera.com> Date: Tue Feb 2 14:02:12 2021 -0800 IMPALA-9979: part 2: partitioned top-n {code} was (Author: fangyurao): I started git bisecting from [IMPALA-9132: Explain statements should not cause nullptr in LogLineageRecord()|https://github.com/apache/impala/commit/f49f8d8a32] (which is not affected by the bug) and it told us that the culprit is IMPALA-9979: part 2. {code:java} fangyurao@fangyu:~/Impala_for_FE$ git bisect bad b42c64993d46893488a667fb9c425548fdf964ab is the first bad commit commit b42c64993d46893488a667fb9c425548fdf964ab Author: Tim Armstrong <tarmstr...@cloudera.com> Date: Tue Feb 2 14:02:12 2021 -0800 IMPALA-9979: part 2: partitioned top-n {code} > Predicate pushdown causes incorrect results in join condition > ------------------------------------------------------------- > > Key: IMPALA-13262 > URL: https://issues.apache.org/jira/browse/IMPALA-13262 > Project: IMPALA > Issue Type: Bug > Reporter: Fang-Yu Rao > Assignee: Fang-Yu Rao > Priority: Major > Labels: correctness > > We found that in some scenario Apache Impala > (https://github.com/apache/impala/commit/c539874) could incorrectly push > predicates to scan nodes, which in turn produces the wrong result. The > following is a concrete example to reproduce the issue. > {code:sql} > create database impala_13262; > use impala_13262; > create table department ( dept_no integer, dept_rank integer, start_date > timestamp,end_date timestamp); > insert into department values(1,1,'2024-01-01','2024-01-02'); > insert into department values(1,2,'2024-01-02','2024-01-03'); > insert into department values(1,3,'2024-01-03','2024-01-03'); > create table employee (employee_no integer, depart_no integer); > insert into employee values (1,1); > // The following query should return 0 row. However Apache Impala produces > one row. > select * from employee t1 > inner join ( > select * from > ( > select dept_no,dept_rank,start_date,end_date > ,row_number() over(partition by dept_no order by dept_rank) rn > from department > ) t2 > where rn=1 > ) t2 > on t1.depart_no=t2.dept_no > where t2.start_date=t2.end_date; > set explain_level=2; > // In the output of the EXPLAIN statement, we found that the predicate > "start_data = end_date" was pushed > // down to the scan node, which is wrong. > | 01:SCAN HDFS [impala_13262.department, RANDOM] > | > | HDFS partitions=1/1 files=3 size=132B > | > | predicates: start_date = end_date > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=1 row-size=40B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > +-------------------------------------------------------------------------------------------------------+ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org