[jira] [Updated] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions

jiang13021 (Jira) Mon, 04 Dec 2023 01:45:04 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-46240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


jiang13021 updated SPARK-46240:
-------------------------------
    Description: 
Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
However, users do not have the ability to add rules in this context.
{code:java}
// org.apache.spark.sql.execution.QueryExecution#preparations  
private[execution] def preparations(
    sparkSession: SparkSession,
    adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
    subquery: Boolean): Seq[Rule[SparkPlan]] = {
  // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
rules will be no-op
  // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
  adaptiveExecutionRule.toSeq ++
  Seq(
    CoalesceBucketsInJoin,
    PlanDynamicPruningFilters(sparkSession),
    PlanSubqueries(sparkSession),
    RemoveRedundantProjects,
    EnsureRequirements(),
    // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to 
guarantee the
    // sort order of each node is checked to be valid.
    ReplaceHashWithSortAgg,
    // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
guarantee the same
    // number of partitions when instantiating PartitioningCollection.
    RemoveRedundantSorts,
    DisableUnnecessaryBucketedScan,
    ApplyColumnarRulesAndInsertTransitions(
      sparkSession.sessionState.columnarRules, outputsColumnar = false),
    CollapseCodegenStages()) ++
    (if (subquery) {
      Nil
    } else {
      Seq(ReuseExchangeAndSubquery)
    })
}{code}
We could add an extension called "PrepExecutedPlanRule" to 
SparkSessionExtensions,  which would allow users to add their own rules.

  was:
Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
However, users do not have the ability to add rules in this context.
{code:java}
// org.apache.spark.sql.execution.QueryExecution#preparations  
private[execution] def preparations(
    sparkSession: SparkSession,
    adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
    subquery: Boolean): Seq[Rule[SparkPlan]] = {
  // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
rules will be no-op
  // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
  adaptiveExecutionRule.toSeq ++
  Seq(
    CoalesceBucketsInJoin,
    PlanDynamicPruningFilters(sparkSession),
    PlanSubqueries(sparkSession),
    RemoveRedundantProjects,
    EnsureRequirements(),
    // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` to 
guarantee the
    // sort order of each node is checked to be valid.
    ReplaceHashWithSortAgg,
    // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
guarantee the same
    // number of partitions when instantiating PartitioningCollection.
    RemoveRedundantSorts,
    DisableUnnecessaryBucketedScan,
    ApplyColumnarRulesAndInsertTransitions(
      sparkSession.sessionState.columnarRules, outputsColumnar = false),
    CollapseCodegenStages()) ++
    (if (subquery) {
      Nil
    } else {
      Seq(ReuseExchangeAndSubquery)
    })
}{code}
We could add an extension called "PrepExecutedPlanRule" to 
SparkSessionExtensions,  which would allow users to add their own rules.


> Add PrepExecutedPlanRule to SparkSessionExtensions
> --------------------------------------------------
>
>                 Key: SPARK-46240
>                 URL: https://issues.apache.org/jira/browse/SPARK-46240
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0, 3.3.0, 3.4.0
>            Reporter: jiang13021
>            Priority: Major
>
> Some rules (Rule[SparkPlan]) are applied when preparing for the executedPlan. 
> However, users do not have the ability to add rules in this context.
> {code:java}
> // org.apache.spark.sql.execution.QueryExecution#preparations  
> private[execution] def preparations(
>     sparkSession: SparkSession,
>     adaptiveExecutionRule: Option[InsertAdaptiveSparkPlan] = None,
>     subquery: Boolean): Seq[Rule[SparkPlan]] = {
>   // `AdaptiveSparkPlanExec` is a leaf node. If inserted, all the following 
> rules will be no-op
>   // as the original plan is hidden behind `AdaptiveSparkPlanExec`.
>   adaptiveExecutionRule.toSeq ++
>   Seq(
>     CoalesceBucketsInJoin,
>     PlanDynamicPruningFilters(sparkSession),
>     PlanSubqueries(sparkSession),
>     RemoveRedundantProjects,
>     EnsureRequirements(),
>     // `ReplaceHashWithSortAgg` needs to be added after `EnsureRequirements` 
> to guarantee the
>     // sort order of each node is checked to be valid.
>     ReplaceHashWithSortAgg,
>     // `RemoveRedundantSorts` needs to be added after `EnsureRequirements` to 
> guarantee the same
>     // number of partitions when instantiating PartitioningCollection.
>     RemoveRedundantSorts,
>     DisableUnnecessaryBucketedScan,
>     ApplyColumnarRulesAndInsertTransitions(
>       sparkSession.sessionState.columnarRules, outputsColumnar = false),
>     CollapseCodegenStages()) ++
>     (if (subquery) {
>       Nil
>     } else {
>       Seq(ReuseExchangeAndSubquery)
>     })
> }{code}
> We could add an extension called "PrepExecutedPlanRule" to 
> SparkSessionExtensions,  which would allow users to add their own rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46240) Add PrepExecutedPlanRule to SparkSessionExtensions

Reply via email to