[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94859595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- OK that makes sense. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94774210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- The conf won't change in a query, but if we cache plans across queries, then we can't cache the calculated stats. And now I do find such plans e.g. `cachedDataSourceTables`. If we don't cache the stats, estimation can be a performance hit. Then we need to use lazy val and don't depend on the conf. In the current stage it's ok. I just thought in the future we may need some parameters in more complicated estimation. e.g. we may need a threshold to determine whether a column is a primary key. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94732192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- unless of course you want to tweak it depending on the value of the conf. but then you can't just simply cache it here can you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94731500 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- But estimatedStats can be lazy val, can't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94714976 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- `estimatedStats` is a cache for the `def cboStatistics()` where stats is calculated by using column stats. And because I want to pass conf into cboStatistics(some parameters may be needed during estimation in the future), I can't use a lazy val, and I do this cache explicitly with `estimatedStats`. Yes, the naming causes ambiguity, can you give it a better name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94708121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + protected def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Option[Statistics] = None --- End diff -- we don't we just use a lazy val? Also the naming makes no sense since all statistics are estimated. What's the difference between statistics and estimatedStats? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16401 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94353251 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} --- End diff -- Here I want to set `estimatedStats`, such that it won't be calculated every time it is called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94344029 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats.isEmpty) { +estimatedStats = Some(cboStatistics(conf)) + } + estimatedStats.get +} else { + statistics +} --- End diff -- The above logics can be shorten to one line: ```Scala if (conf.cboEnabled) estimatedStats.getOrElse(cboStatistics(conf)) else statistics ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94280370 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.estimation --- End diff -- `statsEstimation` looks fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94279993 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Statistics = _ --- End diff -- ok I'll use Option here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94279996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = +SQLConfigBuilder("spark.sql.cbo.enabled") + .internal() --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r9427 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.estimation --- End diff -- I'm not sure. statsEstimation? cardinalityEstimation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94280002 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94279929 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = --- End diff -- Although it seems to influence only how the plan stats are calculated, but that can lead to changes in output plans and thus execution procedure, so I think we can say it's meant for enabling the cbo framework. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94253091 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics --- End diff -- protected? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.estimation --- End diff -- estimation? Any better name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251558 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = +SQLConfigBuilder("spark.sql.cbo.enabled") + .internal() --- End diff -- Internal? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94251460 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -95,6 +96,29 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the default statistics or statistics estimated by cbo based on configuration. + */ + final def planStats(conf: CatalystConf): Statistics = { +if (conf.cboEnabled) { + if (estimatedStats == null) { +estimatedStats = cboStatistics(conf) + } + estimatedStats +} else { + statistics +} + } + + /** + * Returns statistics estimated by cbo. If the plan doesn't override this, it returns the + * default statistics. + */ + def cboStatistics(conf: CatalystConf): Statistics = statistics + + /** A cache for the estimated statistics, such that it will only be computed once. */ + private var estimatedStats: Statistics = _ --- End diff -- Use `Option` here? Or use `@Nullable` to explicitly mark it nullable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r94232049 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = --- End diff -- Is this meant for enabling the whole cbo framework or just for controlling how the plan statistics calculated? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r93846292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = +SQLConfigBuilder("spark.sql.cbo.enabled") + .internal() + .doc("Enables CBO for estimation of plan statistics when set true.") + .booleanConf + .createWithDefault(false) --- End diff -- shall we enable it by default? cc @hvanhovell @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16401 [SPARK-18998] [SQL] Add a cbo conf to switch between default statistics and estimated statistics ## What changes were proposed in this pull request? We add a cbo configuration to switch between default stats and estimated stats. We also define a new statistics method `planStats` in LogicalPlan with conf as its parameter, in order to pass the cbo switch and other estimation related configurations in the future. `planStats` is used on the caller sides (i.e. Optimizer and Strategies) to make transformation decisions based on stats. ## How was this patch tested? Add a test case using a dummy LogicalPlan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark cboSwitch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16401.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16401 commit 53c1b26e9fc7c253b1654145f910e7881db34de7 Author: Zhenhua Wang Date: 2016-12-24T15:43:53Z add cbo switch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org