[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315922832 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala ## @@ -143,7 +143,8 @@ case class ExplainCommand( logicalPlan: LogicalPlan, extended: Boolean = false, Review comment: @cloud-fan There are quite a few callers of this method in test suites. Also, the way things are parsed, i have to put a if block in `SparkSqlParse.visitExplain` to populate the the enum value. If its ok, can i please take this as a small follow-up ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315923172 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( Review comment: @cloud-fan Sorry.. we need it as this calls append.. and it has the type scope..I get a compilation error if i remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315923172 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( Review comment: @cloud-fan Sorry.. we need it as this calls append.. and it has the type scope.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315922832 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala ## @@ -143,7 +143,8 @@ case class ExplainCommand( logicalPlan: LogicalPlan, extended: Boolean = false, Review comment: @cloud-fan There are quite a few callers of this method in test suites. Also, the way things are parsed, i have to put a if block in `SparkSqlParse.visitExplain` to populate the the enum value. If its ok, can take this as a small follow-up ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315804479 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala ## @@ -682,6 +699,28 @@ abstract class BaseSubqueryExec extends SparkPlan { override def outputPartitioning: Partitioning = child.outputPartitioning override def outputOrdering: Seq[SortOrder] = child.outputOrdering + + override def generateTreeString( +depth: Int, Review comment: @cloud-fan oops.. some how need to set up my ide to catch this :-).. will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315804272 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala ## @@ -19,6 +19,8 @@ package org.apache.spark.sql.execution.aggregate import java.util.concurrent.TimeUnit._ +import scala.collection.mutable Review comment: @cloud-fan will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315804244 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( + plan: => QueryPlan[T], + append: String => Unit, + startOperatorID: Int): Int = { + +// ReusedSubqueryExecs are skipped over +if (plan.isInstanceOf[BaseSubqueryExec]) { + return startOperatorID +} + +val operationIDs = new mutable.ArrayBuffer[(Int, QueryPlan[_])]() +var currentOperatorID = startOperatorID +try { + currentOperatorID = generateOperatorIDs(plan, currentOperatorID, operationIDs) + generateWholeStageCodegenIdMap(plan) + + QueryPlan.append( +plan, +append, +verbose = false, +addSuffix = false, +printOperatorId = true) + + append("\n") + var i: Integer = 0 + for ((opId, curPlan) <- operationIDs) { +append(curPlan.verboseStringWithOperatorId()) + } +} catch { + case e: AnalysisException => append(e.toString) +} +currentOperatorID + } + + /** + * Given a input physical plan, performs the following tasks. + * 1. Generates the explain output for the input plan excluding the subquery plans. + * 2. Generates the explain output for each subquery referenced in the plan. + */ + def processPlan[T <: QueryPlan[T]]( + plan: => QueryPlan[T], + append: String => Unit): Unit = { +try { + val subqueries = ArrayBuffer.empty[(SparkPlan, Expression, SparkPlan)] + var currentOperatorID = 0 + currentOperatorID = processPlanSkippingSubqueries(plan, append, currentOperatorID) + getSubqueries(plan, subqueries) + var i = 0 + + // Process all the subqueries in the plan. + for (sub <- subqueries) { +if (i == 0) { + append("\n= Subqueries =\n\n") +} +i = i + 1 +append(s"Subquery:$i Hosting operator id = " + + s"${getOpId(sub._1)} Hosting Expression = ${sub._2}\n") + +// For each subquery expression in the parent plan, process its child plan to compute +// the explain output. +currentOperatorID = processPlanSkippingSubqueries( + sub._3, + append, + currentOperatorID) +append("\n") + } +} finally { + removeTags(plan) +} + } + + /** + * Traverses the supplied input plan in a bottem-up fashion to produce the following two maps : + *1. operator -> operator identifier + *2. operator identifier -> operator + * Note : + *1. Operator such as WholeStageCodegenExec and InputAdapter are skipped as they don't + * appear in the explain output. + *2. operator identifier starts at startIdx + 1 + */ + private def generateOperatorIDs( + plan: QueryPlan[_], + startOperatorID: Int, + operatorIDs: mutable.ArrayBuffer[(Int, QueryPlan[_])]): Int = { +var currentOperationID = startOperatorID +// Skip the subqueries as they are not printed as part of main query block. +if (plan.isIn
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315803314 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( + plan: => QueryPlan[T], + append: String => Unit, + startOperatorID: Int): Int = { Review comment: @cloud-fan OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315803402 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( + plan: => QueryPlan[T], + append: String => Unit, + startOperatorID: Int): Int = { + +// ReusedSubqueryExecs are skipped over +if (plan.isInstanceOf[BaseSubqueryExec]) { + return startOperatorID +} + +val operationIDs = new mutable.ArrayBuffer[(Int, QueryPlan[_])]() +var currentOperatorID = startOperatorID +try { + currentOperatorID = generateOperatorIDs(plan, currentOperatorID, operationIDs) + generateWholeStageCodegenIdMap(plan) + + QueryPlan.append( +plan, +append, +verbose = false, +addSuffix = false, +printOperatorId = true) + + append("\n") + var i: Integer = 0 + for ((opId, curPlan) <- operationIDs) { +append(curPlan.verboseStringWithOperatorId()) + } +} catch { + case e: AnalysisException => append(e.toString) +} +currentOperatorID + } + + /** + * Given a input physical plan, performs the following tasks. + * 1. Generates the explain output for the input plan excluding the subquery plans. + * 2. Generates the explain output for each subquery referenced in the plan. + */ + def processPlan[T <: QueryPlan[T]]( + plan: => QueryPlan[T], + append: String => Unit): Unit = { +try { + val subqueries = ArrayBuffer.empty[(SparkPlan, Expression, SparkPlan)] + var currentOperatorID = 0 + currentOperatorID = processPlanSkippingSubqueries(plan, append, currentOperatorID) + getSubqueries(plan, subqueries) + var i = 0 + + // Process all the subqueries in the plan. + for (sub <- subqueries) { +if (i == 0) { + append("\n= Subqueries =\n\n") +} +i = i + 1 +append(s"Subquery:$i Hosting operator id = " + + s"${getOpId(sub._1)} Hosting Expression = ${sub._2}\n") + +// For each subquery expression in the parent plan, process its child plan to compute +// the explain output. +currentOperatorID = processPlanSkippingSubqueries( + sub._3, + append, + currentOperatorID) +append("\n") + } +} finally { + removeTags(plan) +} + } + + /** + * Traverses the supplied input plan in a bottem-up fashion to produce the following two maps : + *1. operator -> operator identifier + *2. operator identifier -> operator + * Note : + *1. Operator such as WholeStageCodegenExec and InputAdapter are skipped as they don't + * appear in the explain output. + *2. operator identifier starts at startIdx + 1 + */ + private def generateOperatorIDs( + plan: QueryPlan[_], + startOperatorID: Int, + operatorIDs: mutable.ArrayBuffer[(Int, QueryPlan[_])]): Int = { +var currentOperationID = startOperatorID +// Skip the subqueries as they are not printed as part of main query block. +if (plan.isIn
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315803233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala ## @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.{Expression, PlanExpression} +import org.apache.spark.sql.catalyst.plans.QueryPlan +import org.apache.spark.sql.catalyst.trees.TreeNodeTag + +object ExplainUtils { + /** + * Given a input physical plan, performs the following tasks. + * 1. Generate the plan -> operator id map. + * 2. Generate the plan -> codegen id map + * 3. Generate the two part explain output for this plan. + * 1. First part explains the operator tree with each operator tagged with an unique + * identifier. + * 2. Second part explans each operator in a verbose manner. + * + * Note : This function skips over subqueries. They are handled by its caller. + */ + private def processPlanSkippingSubqueries[T <: QueryPlan[T]]( Review comment: @cloud-fan No.. don't need it.. i probably copied from somewhere :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315802736 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ## @@ -273,6 +288,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } object QueryPlan extends PredicateHelper { + val opidTag = TreeNodeTag[Int]("operatorId") Review comment: @cloud-fan Will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315802822 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ## @@ -273,6 +288,9 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT } object QueryPlan extends PredicateHelper { + val opidTag = TreeNodeTag[Int]("operatorId") + val codegenTag = new TreeNodeTag[Int]("wholeStageCodegenId") Review comment: @cloud-fan Will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r315803003 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1512,6 +1512,16 @@ object SQLConf { "When this conf is not set, the value from `spark.redaction.string.regex` is used.") .fallbackConf(org.apache.spark.internal.config.STRING_REDACTION_PATTERN) + val SQL_EXPLAIN_LEGACY_FORMAT = Review comment: @cloud-fan Sounds good. Will remove the property. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314587245 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -64,6 +66,15 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ protected def sparkContext = sqlContext.sparkContext + protected def wholestageCodegenIdStr(codegenId: Option[Int]): String = { +codegenId.map("[codegen id : " + _ + "]").getOrElse("") + } + + protected def operatorIdStr( Review comment: @cloud-fan will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314587233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala ## @@ -784,15 +787,17 @@ case class WholeStageCodegenExec(child: SparkPlan)(val codegenStageId: Int) verbose: Boolean, prefix: String = "", addSuffix: Boolean = false, - maxFields: Int): Unit = { + maxFields: Int, + printNodeId: Boolean): Unit = { child.generateTreeString( depth, lastChildren, append, verbose, - s"*($codegenStageId) ", + if (printNodeId) "*" else s"*($codegenStageId) ", Review comment: @cloud-fan ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314587254 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -127,10 +131,14 @@ class QueryExecution( ReuseExchange(sparkSession.sessionState.conf), ReuseSubquery(sparkSession.sessionState.conf)) - def simpleString: String = withRedaction { + def simpleString[T <: QueryPlan[T]]: String = withRedaction { Review comment: @cloud-fan not required... thanks .. will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314587247 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -64,6 +66,15 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ protected def sparkContext = sqlContext.sparkContext + protected def wholestageCodegenIdStr(codegenId: Option[Int]): String = { Review comment: @cloud-fan yeah.. sorry.. will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314587222 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala ## @@ -17,6 +17,8 @@ package org.apache.spark.sql.execution.aggregate +import scala.collection.mutable Review comment: @cloud-fan will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314576051 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -539,24 +544,27 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { override def toString: String = treeString /** Returns a string representation of the nodes in this tree */ + // final def treeString: String = treeString(verbose = true) + final def treeString: String = treeString(verbose = true) final def treeString( - verbose: Boolean, - addSuffix: Boolean = false, - maxFields: Int = SQLConf.get.maxToStringFields): String = { +verbose: Boolean, Review comment: @cloud-fan Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314576009 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -526,9 +531,9 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { * @param maxFields Maximum number of fields that will be converted to strings. * Any elements beyond the limit will be dropped. */ - def simpleString(maxFields: Int): String = { -s"$nodeName ${argString(maxFields)}".trim - } + def simpleString(maxFields: Int): String = s"$nodeName ${argString(maxFields)}".trim + + def simpleStringWithNodeId(): String = simpleString(0) Review comment: @cloud-fan Good idea Wenchen. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r314567385 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ## @@ -179,6 +182,22 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT override def verboseString(maxFields: Int): String = simpleString(maxFields) + override def simpleStringWithNodeId(): String = { +val tag = new TreeNodeTag[Int]("operatorId") Review comment: @cloud-fan Thanks.. will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r313532384 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -530,7 +531,10 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { s"$nodeName ${argString(maxFields)}".trim } - /** ONE line description of this node with more information */ + def simpleString( + planToOperatorID: mutable.LinkedHashMap[QueryPlan[_], Int]): String = s"$nodeName".trim Review comment: @cloud-fan I agree. Actually i tried very hard to place the new methods in QueryPlan and keep TreeNode untouched. But couldn't do it as i was operating with a premise to not modify the plan in anyway. But your subsequent comment to use tagging to store the operatorid temporarily in the plan may help. Let me think about this a bit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r313532672 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -64,6 +66,22 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ protected def sparkContext = sqlContext.sparkContext + /** + * ONE line description of this node. + */ + override def simpleString(planLabelMap: mutable.LinkedHashMap[QueryPlan[_], Int]): String = { +s"$nodeName (${operatorIdStr(planLabelMap)})".trim Review comment: @cloud-fan Using your idea to use tagging to store operator id will help here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r313532384 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -530,7 +531,10 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { s"$nodeName ${argString(maxFields)}".trim } - /** ONE line description of this node with more information */ + def simpleString( + planToOperatorID: mutable.LinkedHashMap[QueryPlan[_], Int]): String = s"$nodeName".trim Review comment: @cloud-fan I agree. Actually i tried very hard make place the new methods in QueryPlan and keep TreeNode untouched. But couldn't do it as i was operating with a premise to not modify the plan in anyway. But your subsequent comment to use tagging to store the operatorid temporarily in the plan may help. Let me think about this a bit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r310018626 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -516,11 +543,28 @@ trait UnaryExecNode extends SparkPlan { def child: SparkPlan override final def children: Seq[SparkPlan] = child :: Nil + override def verboseString( + planToOperatorID : mutable.LinkedHashMap[TreeNode[_], Int], + codegenId: Option[Int]): String = { +s""" + |(${operatorIdStr(planToOperatorID)}) $nodeName ${wholestageCodegenIdStr(codegenId)} + |Input: ${child.output.mkString("[", ", ", "]")} + """.stripMargin + } } trait BinaryExecNode extends SparkPlan { def left: SparkPlan def right: SparkPlan override final def children: Seq[SparkPlan] = Seq(left, right) + override def verboseString( + planToOperatorID: mutable.LinkedHashMap[TreeNode[_], Int], + codegenId: Option[Int]): String = { +s""" + |(${operatorIdStr(planToOperatorID)}) $nodeName ${wholestageCodegenIdStr(codegenId)} + |Left output: ${left.output.mkString("[", ", ", "]")} + |Right output: ${right.output.mkString("[", ", ", "]")} Review comment: @cloud-fan For binary exec, we again show the input which is basically left's output and right's output. ok ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r310018341 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -516,11 +543,28 @@ trait UnaryExecNode extends SparkPlan { def child: SparkPlan override final def children: Seq[SparkPlan] = child :: Nil + override def verboseString( + planToOperatorID : mutable.LinkedHashMap[TreeNode[_], Int], + codegenId: Option[Int]): String = { +s""" + |(${operatorIdStr(planToOperatorID)}) $nodeName ${wholestageCodegenIdStr(codegenId)} + |Input: ${child.output.mkString("[", ", ", "]")} Review comment: @cloud-fan For non-leaf node, we display the input here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r310018200 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -503,6 +522,14 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ trait LeafExecNode extends SparkPlan { override final def children: Seq[SparkPlan] = Nil override def producedAttributes: AttributeSet = outputSet + override def verboseString( + planToOperatorID: mutable.LinkedHashMap[TreeNode[_], Int], + codegenId: Option[Int]): String = { +s""" + |(${operatorIdStr(planToOperatorID)}) $nodeName ${wholestageCodegenIdStr(codegenId)} + |Output: ${producedAttributes.mkString("[", ", ", "]")} Review comment: @cloud-fan As per our simple rule, we display the output for leaf node. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command
dilipbiswal commented on a change in pull request #24759: [SPARK-27395][SQL][WIP] Improve EXPLAIN command URL: https://github.com/apache/spark/pull/24759#discussion_r309780131 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -530,12 +530,27 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { s"$nodeName ${argString(maxFields)}".trim } + def simpleString(planToOperatorID: mutable.LinkedHashMap[TreeNode[_], Int]): String = { +val operatorID = planToOperatorID.get(this).map(v => s"$v").getOrElse("unknown") +s"$nodeName ($operatorID)".trim + } + /** ONE line description of this node with more information */ def verboseString(maxFields: Int): String /** ONE line description of this node with some suffix information */ def verboseStringWithSuffix(maxFields: Int): String = verboseString(maxFields) + def verboseString( + planToOperatorID: mutable.LinkedHashMap[TreeNode[_], Int], Review comment: @cloud-fan Sounds reasonable. Its been a while :-) Let me try to look at the code later today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org