[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94859595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

OK that makes sense. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94774210
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

The conf won't change in a query, but if we cache plans across queries, 
then we can't cache the  calculated stats.  And now I do find such plans e.g. 
`cachedDataSourceTables`.
If we don't cache the stats, estimation can be a performance hit.
Then we need to use lazy val and don't depend on the conf. In the current 
stage it's ok. I just thought in the future we may need some parameters in more 
complicated estimation. e.g. we may need a threshold to determine whether a 
column is a primary key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94732192
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

unless of course you want to tweak it depending on the value of the conf. 
but then you can't just simply cache it here can you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-05 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94731500
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

But estimatedStats can be lazy val, can't it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-04 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94714976
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

`estimatedStats` is a cache for the `def cboStatistics()` where stats is 
calculated by using column stats. And because I want to pass conf into 
cboStatistics(some parameters may be needed during estimation in the future), I 
can't use a lazy val, and I do this cache explicitly with `estimatedStats`.
Yes, the naming causes ambiguity, can you give it a better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-04 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94708121
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

we don't we just use a lazy val? Also the naming makes no sense since all 
statistics are estimated. What's the difference between statistics and 
estimatedStats?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16401


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-02 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94353251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
--- End diff --

Here I want to set `estimatedStats`, such that it won't be calculated every 
time it is called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94344029
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
--- End diff --

The above logics can be shorten to one line:
```Scala
if (conf.cboEnabled) estimatedStats.getOrElse(cboStatistics(conf)) else 
statistics
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94280370
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala 
---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.estimation
--- End diff --

`statsEstimation` looks fine to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94279993
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Statistics = _
--- End diff --

ok I'll use Option here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94279996
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
+SQLConfigBuilder("spark.sql.cbo.enabled")
+  .internal()
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r9427
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala 
---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.estimation
--- End diff --

I'm not sure. statsEstimation? cardinalityEstimation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94280002
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-31 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94279929
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
--- End diff --

Although it seems to influence only how the plan stats are calculated, but 
that can lead to changes in output plans and thus execution procedure, so I 
think we can say it's meant for enabling the cbo framework.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94253091
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
--- End diff --

protected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251780
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/estimation/EstimationSuite.scala 
---
@@ -0,0 +1,67 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.estimation
--- End diff --

estimation? Any better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251558
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
+SQLConfigBuilder("spark.sql.cbo.enabled")
+  .internal()
--- End diff --

Internal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94251460
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats == null) {
+estimatedStats = cboStatistics(conf)
+  }
+  estimatedStats
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Statistics = _
--- End diff --

Use `Option` here? Or use `@Nullable` to explicitly mark it nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94232049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
--- End diff --

Is this meant for enabling the whole cbo framework or just for controlling 
how the plan statistics calculated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r93846292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -642,6 +642,13 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val CBO_ENABLED =
+SQLConfigBuilder("spark.sql.cbo.enabled")
+  .internal()
+  .doc("Enables CBO for estimation of plan statistics when set true.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

shall we enable it by default? cc @hvanhovell @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2016-12-25 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16401

[SPARK-18998] [SQL] Add a cbo conf to switch between default statistics and 
estimated statistics

## What changes were proposed in this pull request?

We add a cbo configuration to switch between default stats and estimated 
stats.
We also define a new statistics method `planStats` in LogicalPlan with conf 
as its parameter, in order to pass the cbo switch and other estimation related 
configurations in the future. `planStats` is used on the caller sides (i.e. 
Optimizer and Strategies) to make transformation decisions based on stats.

## How was this patch tested?

Add a test case using a dummy LogicalPlan.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark cboSwitch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16401.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16401


commit 53c1b26e9fc7c253b1654145f910e7881db34de7
Author: Zhenhua Wang 
Date:   2016-12-24T15:43:53Z

add cbo switch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org