[ 
https://issues.apache.org/jira/browse/IMPALA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368909#comment-17368909
 ] 

ASF subversion and git services commented on IMPALA-10755:
----------------------------------------------------------

Commit 89131e70ff649545c7192f760ac7f8d2abcb6e37 in impala's branch 
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=89131e7 ]

IMPALA-10755: Fix migration of analytic predicate to inline view.

As part of IMPALA-9979 we made changes to push down predicates
that reference analytic tuple into the inline view. In cases where
both sides of a predicate are slot references (for example,
a = MAX(b) where MAX(b) is an analytic function), it may not be
safe to push it into the inline view since the two sides may be
referencing separate tuples.

This patch fixes the behavior by skipping such predicates such
that they will be left unassigned and will subsequently get
assigned to a SELECT node above the analytic operator.

Testing:
 - Added planner tests for analytic predicates ensuring that
   analytic predicates are present in the SELECT node.
 - Added run time tests for the same using TPC-H and
   verified correctness.

Change-Id: Ib5cad3d408ee3695cafb35f66a4f19b4e8d0529e
Reviewed-on: http://gerrit.cloudera.org:8080/17615
Reviewed-by: Aman Sinha <amsi...@cloudera.com>
Tested-by: Aman Sinha <amsi...@cloudera.com>


> Wrong results for a query with predicate on an analytic function
> ----------------------------------------------------------------
>
>                 Key: IMPALA-10755
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10755
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: correctness, wrongresults
>             Fix For: Impala 4.0
>
>
> In the master branch, the following query with equality predicate on analytic 
> function produces 13 rows whereas the correct result is 3 rows (as verified 
> with Hive):
> {noformat}
> use tpch;
> select *
>   FROM (
>     SELECT
>       l_quantity, l_linenumber,
>       nr_pvp,
>       max(nr_pvp) over (partition BY l_quantity) max_nr_pvp
>     FROM (
>       SELECT
>         l_quantity, l_linenumber,
>         count(1) AS nr_pvp
>       FROM
>         lineitem, partsupp
>         where l_partkey       = ps_partkey
>         and ps_suppkey = 10 and  l_quantity in (1, 2)
>         GROUP BY l_quantity, l_linenumber
>     ) a
>   ) b
>   WHERE nr_pvp = max_nr_pvp
>   order by 1, 2, 3, 4;
> {noformat}
> Wrong results:
> {noformat}
> +------------+--------------+--------+------------+                           
>                            
> | l_quantity | l_linenumber | nr_pvp | max_nr_pvp |
> +------------+--------------+--------+------------+
> | 1.00       | 1            | 11     | 12         |
> | 1.00       | 2            | 12     | 12         |
> | 1.00       | 3            | 10     | 12         |
> | 1.00       | 4            | 5      | 12         |
> | 1.00       | 5            | 4      | 12         |
> | 1.00       | 6            | 5      | 12         |
> | 1.00       | 7            | 2      | 12         |
> | 2.00       | 1            | 12     | 12         |
> | 2.00       | 2            | 12     | 12         |
> | 2.00       | 3            | 11     | 12         |
> | 2.00       | 4            | 8      | 12         |
> | 2.00       | 5            | 3      | 12         |
> | 2.00       | 6            | 2      | 12         |
> +------------+--------------+--------+------------+
> Fetched 13 row(s) in 1.36s
> {noformat}
> Expected results (verified with Hive):
> {noformat}
> +---------------+-----------------+-----------+---------------+
> | b.l_quantity  | b.l_linenumber  | b.nr_pvp  | b.max_nr_pvp  |
> +---------------+-----------------+-----------+---------------+
> | 1.00          | 2               | 12        | 12            |
> | 2.00          | 1               | 12        | 12            |
> | 2.00          | 2               | 12        | 12            |
> +---------------+-----------------+-----------+---------------+
> {noformat}
> Explain plan for Impala shows that the equality predicate 'nr_pvp = 
> max_nr_pvp' which is supposed to be applied on top of the analytic operator 
> is missing:
> {noformat}
> +---------------------------------------------------------------------------+
> | Explain String                                                            |
> +---------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=42.88MB Threads=7               |
> | Per-Host Resource Estimates: Memory=483MB                                 |
> |                                                                           |
> | PLAN-ROOT SINK                                                            |
> | |                                                                         |
> | 11:MERGING-EXCHANGE [UNPARTITIONED]                                       |
> | |  order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | |                                                                         |
> | 06:SORT                                                                   |
> | |  order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | |  row-size=28B cardinality=357                                           |
> | |                                                                         |
> | 05:ANALYTIC                                                               |
> | |  functions: max(count(*))                                               |
> | |  partition by: l_quantity                                               |
> | |  row-size=28B cardinality=357                                           |
> | |                                                                         |
> | 04:SORT                                                                   |
> | |  order by: l_quantity ASC NULLS LAST                                    |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 10:EXCHANGE [HASH(l_quantity)]                                            |
> | |                                                                         |
> | 09:AGGREGATE [FINALIZE]                                                   |
> | |  output: count:merge(*)                                                 |
> | |  group by: l_quantity, l_linenumber                                     |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 08:EXCHANGE [HASH(l_quantity,l_linenumber)]                               |
> | |                                                                         |
> | 03:AGGREGATE [STREAMING]                                                  |
> | |  output: count(*)                                                       |
> | |  group by: l_quantity, l_linenumber                                     |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 02:HASH JOIN [INNER JOIN, BROADCAST]                                      |
> | |  hash predicates: l_partkey = ps_partkey                                |
> | |  runtime filters: RF000 <- ps_partkey                                   |
> | |  row-size=36B cardinality=2.45K                                         |
> | |                                                                         |
> | |--07:EXCHANGE [BROADCAST]                                                |
> | |  |                                                                      |
> | |  01:SCAN HDFS [tpch.partsupp]                                           |
> | |     HDFS partitions=1/1 files=1 size=112.71MB                           |
> | |     predicates: ps_suppkey = 10                                         |
> | |     row-size=16B cardinality=82                                         |
> | |                                                                         |
> | 00:SCAN HDFS [tpch.lineitem]                                              |
> |    HDFS partitions=1/1 files=1 size=718.94MB                              |
> |    predicates: l_quantity IN (1, 2)                                       |
> |    runtime filters: RF000 -> l_partkey                                    |
> |    row-size=20B cardinality=235.34K                                       |
> +---------------------------------------------------------------------------+
> {noformat}
> This is likely a regression since based on some internal testing this was 
> working correctly on an earlier version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to