[ 
https://issues.apache.org/jira/browse/SPARK-57194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emily Sun updated SPARK-57194:
------------------------------
    Description: 
h2. Problem

Custom optimizer rules injected via *SparkSessionExtensions.injectOptimizerRule*
run inside the fixed-point {*}Operator Optimization batch{*}, alongside built-in
rewriters like {_}FoldablePropagation, ConstantFolding, and 
PushDownPredicates{_}{*}.{*}
A custom rule that needs to observe the original plan shape (e.g. cross-side 
join
predicates before they are folded into single-side constants) can be silently
defeated when a built-in rule transforms the plan first within the same fixed 
point.

Existing extension points don't cover this:
 * *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
 * *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too early
 * *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown

h2. Proposed change

Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
batch named "Early Operator Optimization" placed between *Replace Operators*
and *Aggregate,* before the fixed-point Operator Optimization batch.
{code:java}
Batch("Replace Operators", fixedPoint,
  RewriteExceptAll,
  RewriteIntersectAll,
  ReplaceIntersectWithSemiJoin,
  ReplaceExceptWithFilter,
  ReplaceExceptWithAntiJoin,
  ReplaceDistinctWithAggregate,
  ReplaceDeduplicateWithAggregate) ::
Batch("Early Operator Optimization", Once,
  earlyOperatorOptimizationRules: _*) ::
Batch("Aggregate", fixedPoint,
  RemoveLiteralFromGroupExpressions,
  RemoveRepetitionFromGroupExpressions) :: Nil ++ {code}
Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
{*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is registered.
h2. Use cases
 * Outer-join semantics rules that must inspect cross-side join predicates 
before *FoldablePropagation* rewrites them into single-side foldable 
expressions(the concrete motivation for this proposal).
 * Plan-shape analyses / tagging rules that must run on a canonical 
pre-optimization plan.

 * Any custom rule that conceptually belongs to a single Once-batch pass before 
the main optimizer iterates to fixed point.

h2. Compatibility

Purely additive: no existing API or batch ordering changes; default is 
{{{}Nil{}}}.

  was:
h2. Problem

Custom optimizer rules injected via *SparkSessionExtensions.injectOptimizerRule*
run inside the fixed-point {*}Operator Optimization batch{*}, alongside built-in
rewriters like {_}FoldablePropagation, ConstantFolding, and 
PushDownPredicates{_}{*}.{*}
A custom rule that needs to observe the original plan shape (e.g. cross-side 
join
predicates before they are folded into single-side constants) can be silently
defeated when a built-in rule transforms the plan first within the same fixed 
point.

Existing extension points don't cover this:
 * *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
 * *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too early
 * *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown

h2. Proposed change

Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
batch named "Early Operator Optimization" placed between *Replace Operators*
and *Aggregate,* before the fixed-point Operator Optimization batch.
{code:java}
Batch("Replace Operators", fixedPoint,
  RewriteExceptAll,
  RewriteIntersectAll,
  ReplaceIntersectWithSemiJoin,
  ReplaceExceptWithFilter,
  ReplaceExceptWithAntiJoin,
  ReplaceDistinctWithAggregate,
  ReplaceDeduplicateWithAggregate) ::
Batch("Early Operator Optimization", Once,
  earlyOperatorOptimizationRules: _*) ::
Batch("Aggregate", fixedPoint,
  RemoveLiteralFromGroupExpressions,
  RemoveRepetitionFromGroupExpressions) :: Nil ++ {code}
Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
{*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is registered.
h2. Compatibility

Purely additive: no existing API or batch ordering changes; default is 
{{{}Nil{}}}.


> Add earlyOperatorOptimizationRules extension point to Optimizer
> ---------------------------------------------------------------
>
>                 Key: SPARK-57194
>                 URL: https://issues.apache.org/jira/browse/SPARK-57194
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.4.1
>            Reporter: Emily Sun
>            Priority: Major
>
> h2. Problem
> Custom optimizer rules injected via 
> *SparkSessionExtensions.injectOptimizerRule*
> run inside the fixed-point {*}Operator Optimization batch{*}, alongside 
> built-in
> rewriters like {_}FoldablePropagation, ConstantFolding, and 
> PushDownPredicates{_}{*}.{*}
> A custom rule that needs to observe the original plan shape (e.g. cross-side 
> join
> predicates before they are folded into single-side constants) can be silently
> defeated when a built-in rule transforms the plan first within the same fixed 
> point.
> Existing extension points don't cover this:
>  * *extendedOperatorOptimizationRules:* runs inside the same fixed-point batch
>  * *extendedResolutionRules / postHocResolutionRules:* analyzer phase, too 
> early
>  * *earlyScanPushDownRules:* runs after optimization, scoped to scan pushdown
> h2. Proposed change
> Add *earlyOperatorOptimizationRules* on {*}Optimizer{*}, executed in a *Once*
> batch named "Early Operator Optimization" placed between *Replace Operators*
> and *Aggregate,* before the fixed-point Operator Optimization batch.
> {code:java}
> Batch("Replace Operators", fixedPoint,
>   RewriteExceptAll,
>   RewriteIntersectAll,
>   ReplaceIntersectWithSemiJoin,
>   ReplaceExceptWithFilter,
>   ReplaceExceptWithAntiJoin,
>   ReplaceDistinctWithAggregate,
>   ReplaceDeduplicateWithAggregate) ::
> Batch("Early Operator Optimization", Once,
>   earlyOperatorOptimizationRules: _*) ::
> Batch("Aggregate", fixedPoint,
>   RemoveLiteralFromGroupExpressions,
>   RemoveRepetitionFromGroupExpressions) :: Nil ++ {code}
> Wired through *SparkSessionExtensions.injectEarlyOptimizerRule* and
> {*}BaseSessionStateBuilder{*}. The batch is a no-op when no rule is 
> registered.
> h2. Use cases
>  * Outer-join semantics rules that must inspect cross-side join predicates 
> before *FoldablePropagation* rewrites them into single-side foldable 
> expressions(the concrete motivation for this proposal).
>  * Plan-shape analyses / tagging rules that must run on a canonical 
> pre-optimization plan.
>  * Any custom rule that conceptually belongs to a single Once-batch pass 
> before the main optimizer iterates to fixed point.
> h2. Compatibility
> Purely additive: no existing API or batch ordering changes; default is 
> {{{}Nil{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to