AssHero opened a new pull request, #3578:
URL: https://github.com/apache/arrow-datafusion/pull/3578

   # Which issue does this PR close?
   
   Closes #3577 
   
    # Rationale for this change
   optimize join's performance when join has OR clauses in filter or on clause.
   
   examine any OR clauses of join to see if any useful clauses can be extracted 
and push down to join's rel to filter more rows before join.
   
   for TPCH q19 
   
   the logical plan before optimized
   
   | logical_plan  | Projection: #SUM(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount) AS revenue                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
          |
   |               |   Aggregate: groupBy=[[]], 
aggr=[[SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 
2)) AS Decimal128(38, 4)))]]                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      |
   |               |     Projection: #lineitem.l_shipinstruct = Utf8("DELIVER 
IN PERSON") AS #lineitem.l_shipinstruct = Utf8("DELIVER IN 
PERSON")Utf8("DELIVER IN PERSON")#lineitem.l_shipinstruct, #lineitem.l_shipmode 
IN ([Utf8("AIR"), Utf8("AIR REG")]) AS #lineitem.l_shipmode IN ([Utf8("AIR"), 
Utf8("AIR REG")])Utf8("AIR REG")Utf8("AIR")#lineitem.l_shipmode, #part.p_size 
>= Int32(1) AS #part.p_size >= Int32(1)Int32(1)#part.p_size, 
#lineitem.l_quantity, #lineitem.l_extendedprice, #lineitem.l_discount, 
#part.p_brand, #part.p_size, #part.p_container                                  
                                                                                
                                                                                
                                                                                
                                                             |
   |               |       Filter: #part.p_brand = Utf8("Brand#12") AND 
#part.p_container IN ([Utf8("SM CASE"), Utf8("SM BOX"), Utf8("SM PACK"), 
Utf8("SM PKG")]) AND #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND 
#lineitem.l_quantity <= Decimal128(Some(1100),15,2) AND #part.p_size <= 
Int32(5) OR #part.p_brand = Utf8("Brand#23") AND #part.p_container IN 
([Utf8("MED BAG"), Utf8("MED BOX"), Utf8("MED PKG"), Utf8("MED PACK")]) AND 
#lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) AND #part.p_size <= Int32(10) OR #part.p_brand = 
Utf8("Brand#34") AND #part.p_container IN ([Utf8("LG CASE"), Utf8("LG BOX"), 
Utf8("LG PACK"), Utf8("LG PKG")]) AND #lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(3000),15,2) AND #part.p_size <= Int32(15)                       
               |
   |               |         Inner Join: #lineitem.l_partkey = #part.p_partkey  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      |
   |               |           Filter: #lineitem.l_shipmode IN ([Utf8("AIR"), 
Utf8("AIR REG")]) AND #lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
        |
   |               |             TableScan: lineitem projection=[l_partkey, 
l_quantity, l_extendedprice, l_discount, l_shipinstruct, l_shipmode], 
partial_filters=[#lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]), 
#lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON")]                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   |               |           Filter: #part.p_size >= Int32(1)                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      |
   |               |             TableScan: part projection=[p_partkey, 
p_brand, p_size, p_container], partial_filters=[#part.p_size >= Int32(1)]
   
   the logical plan after optimized
   
   | logical_plan  | Projection: #SUM(lineitem.l_extendedprice * Int64(1) - 
lineitem.l_discount) AS revenue                                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
          |
   |               |   Aggregate: groupBy=[[]], 
aggr=[[SUM(CAST(#lineitem.l_extendedprice AS Decimal128(38, 4)) * 
CAST(Decimal128(Some(100),23,2) - CAST(#lineitem.l_discount AS Decimal128(23, 
2)) AS Decimal128(38, 4)))]]                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      |
   |               |     Projection: #lineitem.l_shipinstruct = Utf8("DELIVER 
IN PERSON") AS #lineitem.l_shipinstruct = Utf8("DELIVER IN 
PERSON")Utf8("DELIVER IN PERSON")#lineitem.l_shipinstruct, #lineitem.l_shipmode 
IN ([Utf8("AIR"), Utf8("AIR REG")]) AS #lineitem.l_shipmode IN ([Utf8("AIR"), 
Utf8("AIR REG")])Utf8("AIR REG")Utf8("AIR")#lineitem.l_shipmode, #part.p_size 
>= Int32(1) AS #part.p_size >= Int32(1)Int32(1)#part.p_size, 
#lineitem.l_quantity, #lineitem.l_extendedprice, #lineitem.l_discount, 
#part.p_brand, #part.p_size, #part.p_container                                  
                                                                                
                                                                                
                                                                                
                                                             |
   |               |       Filter: #part.p_brand = Utf8("Brand#12") AND 
#part.p_container IN ([Utf8("SM CASE"), Utf8("SM BOX"), Utf8("SM PACK"), 
Utf8("SM PKG")]) AND #lineitem.l_quantity >= Decimal128(Some(100),15,2) AND 
#lineitem.l_quantity <= Decimal128(Some(1100),15,2) AND #part.p_size <= 
Int32(5) OR #part.p_brand = Utf8("Brand#23") AND #part.p_container IN 
([Utf8("MED BAG"), Utf8("MED BOX"), Utf8("MED PKG"), Utf8("MED PACK")]) AND 
#lineitem.l_quantity >= Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) AND #part.p_size <= Int32(10) OR #part.p_brand = 
Utf8("Brand#34") AND #part.p_container IN ([Utf8("LG CASE"), Utf8("LG BOX"), 
Utf8("LG PACK"), Utf8("LG PKG")]) AND #lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(3000),15,2) AND #part.p_size <= Int32(15)                       
               |
   |               |         Inner Join: #lineitem.l_partkey = #part.p_partkey  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
      |
   |               |           Filter: #lineitem.l_shipmode IN ([Utf8("AIR"), 
Utf8("AIR REG")]) AND #lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON") AND 
#lineitem.l_quantity >= Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(1100),15,2) OR #lineitem.l_quantity >= 
Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) OR #lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(3000),15,2)                                                     
                                                                                
                                                                                
                                                                                
                                                                                
                            |
   |               |             TableScan: lineitem projection=[l_partkey, 
l_quantity, l_extendedprice, l_discount, l_shipinstruct, l_shipmode], 
partial_filters=[#lineitem.l_shipmode IN ([Utf8("AIR"), Utf8("AIR REG")]), 
#lineitem.l_shipinstruct = Utf8("DELIVER IN PERSON"), #lineitem.l_quantity >= 
Decimal128(Some(100),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(1100),15,2) OR #lineitem.l_quantity >= 
Decimal128(Some(1000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(2000),15,2) OR #lineitem.l_quantity >= 
Decimal128(Some(2000),15,2) AND #lineitem.l_quantity <= 
Decimal128(Some(3000),15,2)]                                                    
                                                                                
                                                                                
                                                                      |
   |               |           Filter: #part.p_size >= Int32(1) AND 
#part.p_brand = Utf8("Brand#12") AND #part.p_container IN ([Utf8("SM CASE"), 
Utf8("SM BOX"), Utf8("SM PACK"), Utf8("SM PKG")]) AND #part.p_size <= Int32(5) 
OR #part.p_brand = Utf8("Brand#23") AND #part.p_container IN ([Utf8("MED BAG"), 
Utf8("MED BOX"), Utf8("MED PKG"), Utf8("MED PACK")]) AND #part.p_size <= 
Int32(10) OR #part.p_brand = Utf8("Brand#34") AND #part.p_container IN 
([Utf8("LG CASE"), Utf8("LG BOX"), Utf8("LG PACK"), Utf8("LG PKG")]) AND 
#part.p_size <= Int32(15)                                                       
                                                                                
                                                                                
                                                                                
                                             |
   |               |             TableScan: part projection=[p_partkey, 
p_brand, p_size, p_container], partial_filters=[#part.p_size >= Int32(1), 
#part.p_brand = Utf8("Brand#12") AND #part.p_container IN ([Utf8("SM CASE"), 
Utf8("SM BOX"), Utf8("SM PACK"), Utf8("SM PKG")]) AND #part.p_size <= Int32(5) 
OR #part.p_brand = Utf8("Brand#23") AND #part.p_container IN ([Utf8("MED BAG"), 
Utf8("MED BOX"), Utf8("MED PKG"), Utf8("MED PACK")]) AND #part.p_size <= 
Int32(10) OR #part.p_brand = Utf8("Brand#34") AND #part.p_container IN 
([Utf8("LG CASE"), Utf8("LG BOX"), Utf8("LG PACK"), Utf8("LG PKG")]) AND 
#part.p_size <= Int32(15)]
   
   we extract new predicate and push down to join's rel, this filters more rows 
before join , makes join more effective.
   
   
   # What changes are included in this PR?
   add extract OR clasue in datafusion/optimizer/src/filter_push_down.rs.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to