[ 
https://issues.apache.org/jira/browse/HIVE-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25856.
----------------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Fixed in 
[https://github.com/apache/hive/commit/587c698fa25ca6da46d9c02e4199689426fec40f.]
 Thanks for the review [~kkasa] !

> Intermittent null ordering in plans of queries with GROUP BY and LIMIT
> ----------------------------------------------------------------------
>
>                 Key: HIVE-25856
>                 URL: https://issues.apache.org/jira/browse/HIVE-25856
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code:sql}
> CREATE TABLE person (id INTEGER, country STRING);
> EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
> {code}
> The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) 
> from one execution to another.
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
>     HiveAggregate(group=[{1}], agg#0=[count()])
>       HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
>     HiveAggregate(group=[{1}], agg#0=[count()])
>       HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> This is unlikely to cause wrong results cause most aggregate functions (not 
> all) do not return nulls thus null ordering doesn't matter much but it can 
> lead to other problems such as:
> * intermittent CI failures
> * query/plan caching
> I bumped into this problem after investigating test failures in CI. The 
> following query in 
> [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q]
>  returns different plan when it runs individually and when it runs along with 
> some other qtest files.
> {code:sql}
> explain
> select * from
> (select key, count(1) from src group by key order by key limit 10,20) subq
> join
> (select key, count(1) from src group by key limit 20,20) subq2
> on subq.key=subq2.key limit 3,5;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to