[ https://issues.apache.org/jira/browse/IMPALA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368909#comment-17368909 ]
ASF subversion and git services commented on IMPALA-10755: ---------------------------------------------------------- Commit 89131e70ff649545c7192f760ac7f8d2abcb6e37 in impala's branch refs/heads/master from Aman Sinha [ https://gitbox.apache.org/repos/asf?p=impala.git;h=89131e7 ] IMPALA-10755: Fix migration of analytic predicate to inline view. As part of IMPALA-9979 we made changes to push down predicates that reference analytic tuple into the inline view. In cases where both sides of a predicate are slot references (for example, a = MAX(b) where MAX(b) is an analytic function), it may not be safe to push it into the inline view since the two sides may be referencing separate tuples. This patch fixes the behavior by skipping such predicates such that they will be left unassigned and will subsequently get assigned to a SELECT node above the analytic operator. Testing: - Added planner tests for analytic predicates ensuring that analytic predicates are present in the SELECT node. - Added run time tests for the same using TPC-H and verified correctness. Change-Id: Ib5cad3d408ee3695cafb35f66a4f19b4e8d0529e Reviewed-on: http://gerrit.cloudera.org:8080/17615 Reviewed-by: Aman Sinha <amsi...@cloudera.com> Tested-by: Aman Sinha <amsi...@cloudera.com> > Wrong results for a query with predicate on an analytic function > ---------------------------------------------------------------- > > Key: IMPALA-10755 > URL: https://issues.apache.org/jira/browse/IMPALA-10755 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 4.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Critical > Labels: correctness, wrongresults > Fix For: Impala 4.0 > > > In the master branch, the following query with equality predicate on analytic > function produces 13 rows whereas the correct result is 3 rows (as verified > with Hive): > {noformat} > use tpch; > select * > FROM ( > SELECT > l_quantity, l_linenumber, > nr_pvp, > max(nr_pvp) over (partition BY l_quantity) max_nr_pvp > FROM ( > SELECT > l_quantity, l_linenumber, > count(1) AS nr_pvp > FROM > lineitem, partsupp > where l_partkey = ps_partkey > and ps_suppkey = 10 and l_quantity in (1, 2) > GROUP BY l_quantity, l_linenumber > ) a > ) b > WHERE nr_pvp = max_nr_pvp > order by 1, 2, 3, 4; > {noformat} > Wrong results: > {noformat} > +------------+--------------+--------+------------+ > > | l_quantity | l_linenumber | nr_pvp | max_nr_pvp | > +------------+--------------+--------+------------+ > | 1.00 | 1 | 11 | 12 | > | 1.00 | 2 | 12 | 12 | > | 1.00 | 3 | 10 | 12 | > | 1.00 | 4 | 5 | 12 | > | 1.00 | 5 | 4 | 12 | > | 1.00 | 6 | 5 | 12 | > | 1.00 | 7 | 2 | 12 | > | 2.00 | 1 | 12 | 12 | > | 2.00 | 2 | 12 | 12 | > | 2.00 | 3 | 11 | 12 | > | 2.00 | 4 | 8 | 12 | > | 2.00 | 5 | 3 | 12 | > | 2.00 | 6 | 2 | 12 | > +------------+--------------+--------+------------+ > Fetched 13 row(s) in 1.36s > {noformat} > Expected results (verified with Hive): > {noformat} > +---------------+-----------------+-----------+---------------+ > | b.l_quantity | b.l_linenumber | b.nr_pvp | b.max_nr_pvp | > +---------------+-----------------+-----------+---------------+ > | 1.00 | 2 | 12 | 12 | > | 2.00 | 1 | 12 | 12 | > | 2.00 | 2 | 12 | 12 | > +---------------+-----------------+-----------+---------------+ > {noformat} > Explain plan for Impala shows that the equality predicate 'nr_pvp = > max_nr_pvp' which is supposed to be applied on top of the analytic operator > is missing: > {noformat} > +---------------------------------------------------------------------------+ > | Explain String | > +---------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=42.88MB Threads=7 | > | Per-Host Resource Estimates: Memory=483MB | > | | > | PLAN-ROOT SINK | > | | | > | 11:MERGING-EXCHANGE [UNPARTITIONED] | > | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC | > | | | > | 06:SORT | > | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC | > | | row-size=28B cardinality=357 | > | | | > | 05:ANALYTIC | > | | functions: max(count(*)) | > | | partition by: l_quantity | > | | row-size=28B cardinality=357 | > | | | > | 04:SORT | > | | order by: l_quantity ASC NULLS LAST | > | | row-size=20B cardinality=357 | > | | | > | 10:EXCHANGE [HASH(l_quantity)] | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | group by: l_quantity, l_linenumber | > | | row-size=20B cardinality=357 | > | | | > | 08:EXCHANGE [HASH(l_quantity,l_linenumber)] | > | | | > | 03:AGGREGATE [STREAMING] | > | | output: count(*) | > | | group by: l_quantity, l_linenumber | > | | row-size=20B cardinality=357 | > | | | > | 02:HASH JOIN [INNER JOIN, BROADCAST] | > | | hash predicates: l_partkey = ps_partkey | > | | runtime filters: RF000 <- ps_partkey | > | | row-size=36B cardinality=2.45K | > | | | > | |--07:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [tpch.partsupp] | > | | HDFS partitions=1/1 files=1 size=112.71MB | > | | predicates: ps_suppkey = 10 | > | | row-size=16B cardinality=82 | > | | | > | 00:SCAN HDFS [tpch.lineitem] | > | HDFS partitions=1/1 files=1 size=718.94MB | > | predicates: l_quantity IN (1, 2) | > | runtime filters: RF000 -> l_partkey | > | row-size=20B cardinality=235.34K | > +---------------------------------------------------------------------------+ > {noformat} > This is likely a regression since based on some internal testing this was > working correctly on an earlier version. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org