okumin created HIVE-28080:
-----------------------------

             Summary: Propagate statistics from a source table to the 
materialized CTE
                 Key: HIVE-28080
                 URL: https://issues.apache.org/jira/browse/HIVE-28080
             Project: Hive
          Issue Type: Improvement
          Components: Query Planning
    Affects Versions: 4.0.0-beta-1, 3.1.3, 2.3.8
            Reporter: okumin
            Assignee: okumin


Hive doesn't fill in the statistics of materialized CTEs, and the size of those 
TableScans is underestimated. That causes Tez to run with fewer tasks or to 
fail with OOM because MapJoin could be wrongly applied.

 

The following example shows Map 1 reading `src` generates 493 rows, but Map 3 
reading `cte` is expected to scan only 1 row.
{code:java}
0: jdbc:hive2://hive-hiveserver2:10000/defaul> EXPLAIN WITH cte AS (
. . . . . . . . . . . . . . . . . . . . . . .>   SELECT * FROM src
. . . . . . . . . . . . . . . . . . . . . . .> )
. . . . . . . . . . . . . . . . . . . . . . .> SELECT *
. . . . . . . . . . . . . . . . . . . . . . .> FROM cte a
. . . . . . . . . . . . . . . . . . . . . . .> JOIN cte b ON (a.key = b.key)
. . . . . . . . . . . . . . . . . . . . . . .> JOIN cte c ON (a.key = c.key);
...
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in Stage-4                       |
| Map 2 <- Map 3 (BROADCAST_EDGE), Reducer 4 (BROADCAST_EDGE) |
| Reducer 4 <- Map 3 (SIMPLE_EDGE)                   |
|                                                    |
| Stage-3                                            |
|   Fetch Operator                                   |
|     limit:-1                                       |
|     Stage-4                                        |
|       Map 2 vectorized                             |
|       File Output Operator [FS_69]                 |
|         Map Join Operator [MAPJOIN_68] (rows=1 width=444) |
|           
Conds:MAPJOIN_67._col0=RS_61._col0(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5"]
 |
|         <-Map 3 [BROADCAST_EDGE] vectorized        |
|           SHUFFLE [RS_61]                          |
|             PartitionCols:_col0                    |
|             Select Operator [SEL_60] (rows=1 width=368) |
|               Output:["_col0","_col1"]             |
|               Filter Operator [FIL_59] (rows=1 width=368) |
|                 predicate:key is not null          |
|                 TableScan [TS_11] (rows=1 width=368) |
|                   default@cte,c,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
|         <-Map Join Operator [MAPJOIN_67] (rows=1 width=404) |
|             
Conds:SEL_66._col0=RS_64._col0(Inner),Output:["_col0","_col1","_col2","_col3"] |
|           <-Reducer 4 [BROADCAST_EDGE] vectorized  |
|             BROADCAST [RS_64]                      |
|               PartitionCols:_col0                  |
|               Select Operator [SEL_63]             |
|                 Output:["_col0","_col1"]           |
|           <-Select Operator [SEL_66] (rows=1 width=368) |
|               Output:["_col0","_col1"]             |
|               Filter Operator [FIL_65] (rows=1 width=368) |
|                 predicate:key is not null          |
|                 TableScan [TS_5] (rows=1 width=368) |
|                   default@cte,a,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
|         Stage-2                                    |
|           Dependency Collection{}                  |
|             Stage-1                                |
|               Map 1 vectorized                     |
|               File Output Operator [FS_4]          |
|                 table:{"name:":"default.cte"}      |
|                 Select Operator [SEL_3] (rows=493 width=350) |
|                   Output:["_col0","_col1"]         |
|                   TableScan [TS_0] (rows=493 width=350) |
|                     
default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"] |
|         Stage-0                                    |
|           Move Operator                            |
|              Please refer to the previous Stage-1  |
|                                                    |
+----------------------------------------------------+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to