Zoltan Haindrich created HIVE-22811:
---------------------------------------

             Summary: Statistics are not exploit in nested cases
                 Key: HIVE-22811
                 URL: https://issues.apache.org/jira/browse/HIVE-22811
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
            Reporter: Zoltan Haindrich


The statsOptimizer is able to use min/max/etc values to service simple queries
{code}
(select max(id) from t t0)
{code}

however the same doesn't happen for queries like:

{code}
explain select * from u where u.id>(select max(id) from t t0);
{code}

explain:
{code}
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Reducer 3 <- Map 1 (BROADCAST_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) |
|                                                    |
| Stage-0                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-1                                        |
|       Reducer 3 vectorized                         |
|       File Output Operator [FS_31]                 |
|         Select Operator [SEL_30] (rows=1 width=8)  |
|           Output:["_col0","_col1"]                 |
|           Filter Operator [FIL_29] (rows=1 width=12) |
|             predicate:(_col0 > _col2)              |
|             Map Join Operator [MAPJOIN_28] (rows=3 width=12) |
|               Conds:(Inner),Output:["_col0","_col1","_col2"] |
|             <-Map 1 [BROADCAST_EDGE] vectorized    |
|               BROADCAST [RS_25]                    |
|                 Select Operator [SEL_24] (rows=3 width=8) |
|                   Output:["_col0","_col1"]         |
|                   Filter Operator [FIL_23] (rows=3 width=8) |
|                     predicate:id is not null       |
|                     TableScan [TS_0] (rows=3 width=8) |
|                       
default@u,u,Tbl:COMPLETE,Col:COMPLETE,Output:["id","cnt"] |
|             <-Filter Operator [FIL_27] (rows=1 width=4) |
|                 predicate:_col0 is not null        |
|                 Group By Operator [GBY_26] (rows=1 width=4) |
|                   Output:["_col0"],aggregations:["max(VALUE._col0)"] |
|                 <-Map 2 [CUSTOM_SIMPLE_EDGE] vectorized |
|                   PARTITION_ONLY_SHUFFLE [RS_22]   |
|                     Group By Operator [GBY_21] (rows=1 width=4) |
|                       Output:["_col0"],aggregations:["max(id)"] |
|                       Select Operator [SEL_20] (rows=4 width=4) |
|                         Output:["id"]              |
|                         TableScan [TS_3] (rows=4 width=4) |
|                           
default@t,t0,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to