[GitHub] spark pull request #21668: [SPARK-24690][SQL] Add a new config to control pl...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21668#discussion_r199682495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -375,16 +375,16 @@ case class CatalogStatistics( * Convert [[CatalogStatistics]] to [[Statistics]], and match column stats to attributes based * on column names. */ - def toPlanStats(planOutput: Seq[Attribute], cboEnabled: Boolean): Statistics = { -if (cboEnabled && rowCount.isDefined) { + def toPlanStats(planOutput: Seq[Attribute], planStatsEnabled: Boolean): Statistics = { +if (planStatsEnabled && rowCount.isDefined) { val attrStats = AttributeMap(planOutput .flatMap(a => colStats.get(a.name).map(a -> _.toPlanStat(a.name, a.dataType // Estimate size as number of rows * row size. val size = EstimationUtils.getOutputSize(planOutput, rowCount.get, attrStats) Statistics(sizeInBytes = size, rowCount = rowCount, attributeStats = attrStats) } else { - // When CBO is disabled or the table doesn't have other statistics, we apply the size-only - // estimation strategy and only propagate sizeInBytes in statistics. + // When plan statistics are disabled or the table doesn't have other statistics, + // we apply the size-only estimation strategy and only propagate sizeInBytes in statistics. Statistics(sizeInBytes = sizeInBytes) --- End diff -- yea, I see. We might do so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21668: [SPARK-24690][SQL] Add a new config to control pl...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21668#discussion_r199552405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -375,16 +375,16 @@ case class CatalogStatistics( * Convert [[CatalogStatistics]] to [[Statistics]], and match column stats to attributes based * on column names. */ - def toPlanStats(planOutput: Seq[Attribute], cboEnabled: Boolean): Statistics = { -if (cboEnabled && rowCount.isDefined) { + def toPlanStats(planOutput: Seq[Attribute], planStatsEnabled: Boolean): Statistics = { +if (planStatsEnabled && rowCount.isDefined) { val attrStats = AttributeMap(planOutput .flatMap(a => colStats.get(a.name).map(a -> _.toPlanStat(a.name, a.dataType // Estimate size as number of rows * row size. val size = EstimationUtils.getOutputSize(planOutput, rowCount.get, attrStats) Statistics(sizeInBytes = size, rowCount = rowCount, attributeStats = attrStats) } else { - // When CBO is disabled or the table doesn't have other statistics, we apply the size-only - // estimation strategy and only propagate sizeInBytes in statistics. + // When plan statistics are disabled or the table doesn't have other statistics, + // we apply the size-only estimation strategy and only propagate sizeInBytes in statistics. Statistics(sizeInBytes = sizeInBytes) --- End diff -- cc @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21668: [SPARK-24690][SQL] Add a new config to control pl...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21668#discussion_r199552346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -375,16 +375,16 @@ case class CatalogStatistics( * Convert [[CatalogStatistics]] to [[Statistics]], and match column stats to attributes based * on column names. */ - def toPlanStats(planOutput: Seq[Attribute], cboEnabled: Boolean): Statistics = { -if (cboEnabled && rowCount.isDefined) { + def toPlanStats(planOutput: Seq[Attribute], planStatsEnabled: Boolean): Statistics = { +if (planStatsEnabled && rowCount.isDefined) { val attrStats = AttributeMap(planOutput .flatMap(a => colStats.get(a.name).map(a -> _.toPlanStat(a.name, a.dataType // Estimate size as number of rows * row size. val size = EstimationUtils.getOutputSize(planOutput, rowCount.get, attrStats) Statistics(sizeInBytes = size, rowCount = rowCount, attributeStats = attrStats) } else { - // When CBO is disabled or the table doesn't have other statistics, we apply the size-only - // estimation strategy and only propagate sizeInBytes in statistics. + // When plan statistics are disabled or the table doesn't have other statistics, + // we apply the size-only estimation strategy and only propagate sizeInBytes in statistics. Statistics(sizeInBytes = sizeInBytes) --- End diff -- I think we should respect rowCount, if it is available, no matter whether CBO is on or off. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21668: [SPARK-24690][SQL] Add a new config to control pl...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21668 [SPARK-24690][SQL] Add a new config to control plan stats computation in LogicalRelation ## What changes were proposed in this pull request? This pr proposes to propose a new separate config so that `LogicalRelation` can use `rowCount` to compute data statistics in logical plans. In the master, we currently cannot enable `StarSchemaDetection.reorderStarJoins` because we need to turn off CBO to enable it but `StarSchemaDetection` internally references the `rowCount` that is used in LogicalRelation if CBO disabled. ## How was this patch tested? Added tests in `DataFrameJoinSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark PlanStatsConf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21668 commit f0db73bcad7741ec06560a1b6bb50bb49483b954 Author: Takeshi Yamamuro Date: 2018-06-29T06:22:16Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org