alamb commented on code in PR #3351:
URL: https://github.com/apache/arrow-datafusion/pull/3351#discussion_r962142247
##########
datafusion/optimizer/src/rewrite_disjunctive_predicate.rs:
##########
@@ -23,6 +23,152 @@ use datafusion_expr::Expr::BinaryExpr;
use datafusion_expr::{Expr, LogicalPlan, Operator};
use std::sync::Arc;
+/// Optimizer pass that rewrites predicates of the form
+///
+/// ```text
+/// (A = B AND <expr1>) OR (A = B AND <expr2>) OR ... (A = B AND <exprN>)
+/// ```
+///
+/// Into
+/// ```text
+/// (A = B) AND (<expr1> OR <expr2> OR ... <exprN> )
+/// ```
+///
+/// Predicates connected by `OR` typically not able to be broken down
+/// and distributed as well as those connected by `AND`.
+///
+/// The idea is to rewrite predicates into `good_predicate1 AND
+/// good_predicate2 AND ...` where `good_predicate` means the
+/// predicate has special support in the execution engine.
+///
+/// Equality join predicates (e.g. `col1 = col2`), or single column
+/// expressions (e.g. `col = 5`) are examples of predicates with
+/// special support.
+///
+/// # TPCH Q19
+///
+/// This optimization is admittedly somewhat of a niche usecase. It's
+/// main use is that it appears in TPCH Q19 and is required to avoid a
+/// CROSS JOIN.
+///
+/// Specificially, Q19 has a WHERE clause that looks like
+///
+/// ```sql
+/// where
+/// p_partkey = l_partkey
+/// and l_shipmode in (‘AIR’, ‘AIR REG’)
+/// and l_shipinstruct = ‘DELIVER IN PERSON’
+/// and (
+/// (
+/// and p_brand = ‘[BRAND1]’
+/// and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)
+/// and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10
+/// and p_size between 1 and 5
+/// )
+/// or
+/// (
+/// and p_brand = ‘[BRAND2]’
+/// and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
+/// and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10
+/// and p_size between 1 and 10
+/// )
+/// or
+/// (
+/// and p_brand = ‘[BRAND3]’
+/// and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
+/// and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10
+/// and p_size between 1 and 15
+/// )
+/// )
+/// ```
+///
+/// Niavely planning this query will result in a CROSS join with that
+/// single large OR filter. However, rewriting it using the rewrite in
+/// this pass results in a proper join predicate, `p_partkey = l_partkey`:
+///
+/// ```sql
+/// where
+/// p_partkey = l_partkey
+/// and l_shipmode in (‘AIR’, ‘AIR REG’)
+/// and l_shipinstruct = ‘DELIVER IN PERSON’
+/// and (
+/// (
+/// and p_brand = ‘[BRAND1]’
+/// and p_container in ( ‘SM CASE’, ‘SM BOX’, ‘SM PACK’, ‘SM PKG’)
+/// and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10
+/// and p_size between 1 and 5
+/// )
+/// or
+/// (
+/// and p_brand = ‘[BRAND2]’
+/// and p_container in (‘MED BAG’, ‘MED BOX’, ‘MED PKG’, ‘MED PACK’)
+/// and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10
+/// and p_size between 1 and 10
+/// )
+/// or
+/// (
+/// and p_brand = ‘[BRAND3]’
+/// and p_container in ( ‘LG CASE’, ‘LG BOX’, ‘LG PACK’, ‘LG PKG’)
+/// and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10
+/// and p_size between 1 and 15
+/// )
+/// )
+/// ```
+///
+#[derive(Default)]
Review Comment:
I just moved this code to the top of the module so that the comments
describing what it does can be attached to the structure as well as being at
the top of the file, for better discoverability
No change in behavior is intended
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]