[ 
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597866#comment-17597866
 ] 

Krisztian Kasa commented on HIVE-17342:
---------------------------------------

This seems to be fixed on the master branch:

The plan of the query mentioned in the description:
{code:java}
POSTHOOK: query: explain
SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
WHERE 1=0
POSTHOOK: type: QUERY
POSTHOOK: Input: default@date_dim
#### A masked pattern was here ####
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:0
{code}
It seems that Calcite has the functionality to transform Filter operators with 
always false predicate to a relational expression which returns empty result 
set. (CALCITE-1058) To exploit it Hive introduced it's implementation of this 
expression (HIVE-13316):
[https://github.com/apache/hive/blob/64b731820cf462d3c03632fb4e99277ee539dd08/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java#L133]

CBO plan is
{code:java}
POSTHOOK: query: explain cbo
SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
WHERE 1=0
POSTHOOK: type: QUERY
POSTHOOK: Input: default@date_dim
#### A masked pattern was here ####
CBO PLAN:
HiveSortLimit(fetch=[0])
  HiveProject($f0=[/(CAST($0):DOUBLE, $1)])
    HiveAggregate(group=[{}], agg#0=[sum($0)], agg#1=[count($0)])
      HiveFilter(condition=[>($0, 1999)])
        HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{code}
Since the {{HiveSortLimit(fetch=[0])}} is the root operator no execution is 
needed (HIVE-7203):
[https://github.com/apache/hive/blob/64b731820cf462d3c03632fb4e99277ee539dd08/ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java#L264]

> Where condition with 1=0 should be treated similar to limit 0
> -------------------------------------------------------------
>
>                 Key: HIVE-17342
>                 URL: https://issues.apache.org/jira/browse/HIVE-17342
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to 
> "1=0" to get schema. E.g 
> {noformat}
> SELECT * FROM (select avg(d_year) as  y from date_dim where d_year>1999) q 
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar 
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as  y from date_dim where 
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
>     limit:-1
>     Stage-1
>       Reducer 2 vectorized, llap
>       File Output Operator [FS_13]
>         Group By Operator [GBY_12] (rows=1 width=76)
>           Output:["_col0"],aggregations:["avg(VALUE._col0)"]
>         <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
>           PARTITION_ONLY_SHUFFLE [RS_11]
>             Group By Operator [GBY_10] (rows=1 width=76)
>               Output:["_col0"],aggregations:["avg(d_year)"]
>               Filter Operator [FIL_9] (rows=1 width=0)
>                 predicate:false
>                 TableScan [TS_0] (rows=1 width=0)
>                   
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0 
> rows as output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to