[GitHub] [spark] cloud-fan commented on a change in pull request #33140: [SPARK-35881][SQL] Add support for columnar execution of final query stage in AdaptiveSparkPlanExec

2021-07-06 Thread GitBox


cloud-fan commented on a change in pull request #33140:
URL: https://github.com/apache/spark/pull/33140#discussion_r664757257



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
##
@@ -110,9 +110,12 @@ class SparkSessionExtensions {
   type TableFunctionDescription = (FunctionIdentifier, ExpressionInfo, 
TableFunctionBuilder)
   type ColumnarRuleBuilder = SparkSession => ColumnarRule
   type QueryStagePrepRuleBuilder = SparkSession => Rule[SparkPlan]
+  type PostStageCreationRuleBuilder = SparkSession => Rule[SparkPlan]
 
   private[this] val columnarRuleBuilders = 
mutable.Buffer.empty[ColumnarRuleBuilder]
   private[this] val queryStagePrepRuleBuilders = 
mutable.Buffer.empty[QueryStagePrepRuleBuilder]

Review comment:
   just for curiosity: do you still need to add custom query stage 
preparation rules? We can't remove this API right away but we can deprecate it 
if it's fully replaced by custom post query stage creation rules.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33140: [SPARK-35881][SQL] Add support for columnar execution of final query stage in AdaptiveSparkPlanExec

2021-07-01 Thread GitBox


cloud-fan commented on a change in pull request #33140:
URL: https://github.com/apache/spark/pull/33140#discussion_r662754669



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -122,7 +123,8 @@ case class AdaptiveSparkPlanExec(
 val origins = inputPlan.collect {
   case s: ShuffleExchangeLike => s.shuffleOrigin
 }
-val allRules = queryStageOptimizerRules ++ postStageCreationRules

Review comment:
   I think we need to revisit all the phases in the AQE loop, and think 
about which phases need to accept custom rules for columnar execution.
   
   At the beginning, the input plan comes in, we run `state preparation rules` 
first, to get the initial plan which contains shuffles. Then we create query 
stages on leaf shuffles, and submit query stages after running `stage 
optimization rules` and `post stage creation rules`.
   
   If one query stage finishes, we start the loop:
   1. generate the logical plan with query stage result
   2. re-optimize the logical plan by running `AQEOptimizer`, planner and 
`state preparation rules`
   3. compare the cost, pick the re-optimized plan or the old plan according to 
the cost
   4. create more stages and submit them, wait for next query stage to finish
   
   At the end, we need to optimize the final stage by running `stage 
optimization rules` and `post stage creation rules`.
   
   It looks to me that, we can put the columnar execution custom rules in `post 
stage creation rules`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org