Gopal V created HIVE-16976:
------------------------------

             Summary: DPP: SyntheticJoinPredicate transitivity for < > and 
BETWEEN
                 Key: HIVE-16976
                 URL: https://issues.apache.org/jira/browse/HIVE-16976
             Project: Hive
          Issue Type: Improvement
          Components: Tez
    Affects Versions: 2.1.1, 3.0.0
            Reporter: Gopal V


Tez DPP does not kick in for scenarios where a user wants to run a comparison 
clause instead of a JOIN/IN clause.

{code}
explain select count(1) from store_sales where ss_sold_date_sk > (select 
max(d_Date_sk) from date_dim where d_year = 2017);

Warning: Map Join MAPJOIN[21][bigTable=?] in task 'Map 1' is a cross product
OK
Plan optimized by CBO.

Vertex dependency in root stage
Map 1 <- Reducer 4 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
Reducer 4 <- Map 3 (CUSTOM_SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 2 vectorized, llap
      File Output Operator [FS_36]
        Group By Operator [GBY_35] (rows=1 width=8)
          Output:["_col0"],aggregations:["count(VALUE._col0)"]
        <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
          PARTITION_ONLY_SHUFFLE [RS_34]
            Group By Operator [GBY_33] (rows=1 width=8)
              Output:["_col0"],aggregations:["count(1)"]
              Select Operator [SEL_32] (rows=9600142089 width=16)
                Filter Operator [FIL_31] (rows=9600142089 width=16)
                  predicate:(_col0 > _col1)
                  Map Join Operator [MAPJOIN_30] (rows=28800426268 width=16)
                    Conds:(Inner),Output:["_col0","_col1"]
                  <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
                    BROADCAST [RS_28]
                      Group By Operator [GBY_27] (rows=1 width=8)
                        Output:["_col0"],aggregations:["max(VALUE._col0)"]
                      <-Map 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
                        PARTITION_ONLY_SHUFFLE [RS_26]
                          Group By Operator [GBY_25] (rows=1 width=8)
                            Output:["_col0"],aggregations:["max(d_date_sk)"]
                            Select Operator [SEL_24] (rows=652 width=12)
                              Output:["d_date_sk"]
                              Filter Operator [FIL_23] (rows=652 width=12)
                                predicate:(d_year = 2017)
                                TableScan [TS_2] (rows=73049 width=12)
                                  
tpcds_bin_partitioned_newschema_orc_10000@date_dim,date_dim,Tbl:COMPLETE,Col:COMPLETE,Output:["d_date_sk","d_year"]
                  <-Select Operator [SEL_29] (rows=28800426268 width=8)
                      Output:["_col0"]
                      TableScan [TS_0] (rows=28800426268 width=172)
                        
tpcds_bin_partitioned_newschema_orc_10000@store_sales,store_sales,Tbl:COMPLETE,Col:COMPLETE
{code}

The SyntheticJoinPredicate is only injected for equi joins, not for < or > 
scalar subqueries.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to