[GitHub] [spark] HyukjinKwon commented on a change in pull request #27499: [SPARK-30590][SQL] Untyped select API cannot take typed column expression
HyukjinKwon commented on a change in pull request #27499: [SPARK-30590][SQL] Untyped select API cannot take typed column expression URL: https://github.com/apache/spark/pull/27499#discussion_r380507471 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ## @@ -394,4 +403,21 @@ class DatasetAggregatorSuite extends QueryTest with SharedSparkSession { checkAnswer(group, Row("bob", Row(true, 3)) :: Nil) checkDataset(group.as[OptionBooleanIntData], OptionBooleanIntData("bob", Some((true, 3 } + + test("SPARK-30590: select multiple typed column expressions") { +val df = Seq((1, 2, 3, 4, 5, 6)).toDF("a", "b", "c", "d", "e", "f") +val fooAgg = (i: Int) => FooAgg(i).toColumn.name(s"foo_agg_$i") + +val agg1 = df.select(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5)) +checkDataset(agg1, (3, 5, 7, 9, 11)) + +val agg2 = df.selectUntyped(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5), fooAgg(6)) + .asInstanceOf[Dataset[(Int, Int, Int, Int, Int, Int)]] +checkDataset(agg2, (3, 5, 7, 9, 11, 13)) + +val err = intercept[AnalysisException] { + df.select(fooAgg(1), fooAgg(2), fooAgg(3), fooAgg(4), fooAgg(5), fooAgg(6)) Review comment: Current behaviour seems throwing an exception (as described in the JIRA). ``` scala> df.select(fooAgg(1),fooAgg(2),fooAgg(3),fooAgg(4),fooAgg(5),fooAgg(6)).show org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141];; 'Aggregate [fooagg(FooAgg(1), None, None, None, input[0, int, false] AS value#114, assertnotnull(cast(value#114 as int)), input[0, int, false] AS value#113, IntegerType, IntegerType, false) AS foo_agg_1#116, fooagg(FooAgg(2), None, None, None, input[0, int, false] AS value#119, assertnotnull(cast(value#119 as int)), input[0, int, false] AS value#118, IntegerType, IntegerType, false) AS foo_agg_2#121, fooagg(FooAgg(3), None, None, None, input[0, int, false] AS value#124, assertnotnull(cast(value#124 as int)), input[0, int, false] AS value#123, IntegerType, IntegerType, false) AS foo_agg_3#126, fooagg(FooAgg(4), None, None, None, input[0, int, false] AS value#129, assertnotnull(cast(value#129 as int)), input[0, int, false] AS value#128, IntegerType, IntegerType, false) AS foo_agg_4#131, fooagg(FooAgg(5), None, None, None, input[0, int, false] AS value#134, assertnotnull(cast(value#134 as int)), input[0, int, false] AS value#133, IntegerType, IntegerType, false) AS foo_agg_5#136, fooagg(FooAgg(6), None, None, None, input[0, int, false] AS value#139, assertnotnull(cast(value#139 as int)), input[0, int, false] AS value#138, IntegerType, IntegerType, false) AS foo_agg_6#141] +- Project [_1#6 AS a#13, _2#7 AS b#14, _3#8 AS c#15, _4#9 AS d#16, _5#10 AS e#17, _6#11 AS F#18] +- LocalRelation [_1#6, _2#7, _3#8, _4#9, _5#10, _6#11] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:431) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:430) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:430) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyz
[GitHub] [spark] gatorsmile commented on issue #27477: [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators
gatorsmile commented on issue #27477: [SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators URL: https://github.com/apache/spark/pull/27477#issuecomment-587328326 This would be good to have since both Teradata and Snowflake support it. cc @maryannxue @hvanhovell @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
cloud-fan commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#discussion_r380504333 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -251,7 +251,10 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit def dataType: DataType - override def toString: String = s"cast($child as ${dataType.simpleString})" + override def toString: String = { +val ansi = if (ansiEnabled) "ansi_" else "" Review comment: This follows the existing SQL function naming convention, e.g. `FromUTCTimestamp` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.
HyukjinKwon commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation. URL: https://github.com/apache/spark/pull/27613#issuecomment-587325164 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.
HyukjinKwon closed pull request #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation. URL: https://github.com/apache/spark/pull/27613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
gengliangwang commented on a change in pull request #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#discussion_r380501956 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -251,7 +251,10 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit def dataType: DataType - override def toString: String = s"cast($child as ${dataType.simpleString})" + override def toString: String = { +val ansi = if (ansiEnabled) "ansi_" else "" Review comment: I think it should be "ansiCast" or "ANSICast" instead of "ansi_cast" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587322132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587322122 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587322132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23372/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
AmplabJenkins commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587322122 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
SparkQA commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587321807 **[Test build #118619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118619/testReport)** for PR 27402 at commit [`647851c`](https://github.com/apache/spark/commit/647851c3b6a76f49f9546bbd364202b9fa2d9228). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes closed pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount.
ScrapCodes closed pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount. URL: https://github.com/apache/spark/pull/27520 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ScrapCodes commented on a change in pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount.
ScrapCodes commented on a change in pull request #27520: [SPARK-30771][K8S] Avoid failed mount warning from kubernetes and support the optional mount. URL: https://github.com/apache/spark/pull/27520#discussion_r380498148 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala ## @@ -127,15 +127,18 @@ private[spark] class Client( .pods() .withName(driverPodName) .watch(watcher)) { _ => - val createdDriverPod = kubernetesClient.pods().create(resolvedDriverPod) + var createdDriverPod: Option[Pod] = None try { val otherKubernetesResources = resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap) -addDriverOwnerReference(createdDriverPod, otherKubernetesResources) kubernetesClient.resourceList(otherKubernetesResources: _*).createOrReplace() +createdDriverPod = Some(kubernetesClient.pods().create(resolvedDriverPod)) +addDriverOwnerReference(createdDriverPod.get, otherKubernetesResources) Review comment: @liyinan926 Thanks for helping me think through this, so far whatever solution I could think, I have no way to solve this problem without introducing another problem or risks. I will keep thinking, and if you get any hints, please help me with it. In the meantime, I am closing this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
AmplabJenkins removed a comment on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-581236680 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/117758/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause
HyukjinKwon commented on issue #27402: [SPARK-30679][SQL] REPLACE TABLE can omit the USING clause URL: https://github.com/apache/spark/pull/27402#issuecomment-587319958 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComparison behavior
HyukjinKwon commented on a change in pull request #22038: [SPARK-25056][SQL] Unify the InConversion and BinaryComparison behavior URL: https://github.com/apache/spark/pull/22038#discussion_r380497149 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala ## @@ -491,9 +491,13 @@ object TypeCoercion { i } - case i @ In(a, b) if b.exists(_.dataType != a.dataType) => -findWiderCommonType(i.children.map(_.dataType)) match { - case Some(finalDataType) => i.withNewChildren(i.children.map(Cast(_, finalDataType))) + case i @ In(value, list) if list.exists(_.dataType != value.dataType) => Review comment: @wangyum, are we able to add a legacy configuration? with it, I think it's good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587314887 **[Test build #118618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)** for PR 27616 at commit [`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587319486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587319486 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587319492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118618/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587319348 **[Test build #118618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)** for PR 27616 at commit [`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587319492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118618/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587318920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118607/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
AmplabJenkins removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587318915 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587318920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118607/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
AmplabJenkins commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587318915 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
SparkQA removed a comment on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587250020 **[Test build #118607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118607/testReport)** for PR 27608 at commit [`4896fb5`](https://github.com/apache/spark/commit/4896fb56a4fa600fdba3e9c3600a9adb2effc792). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString
SparkQA commented on issue #27608: [SPARK-30863][SQL] Distinguish Cast and AnsiCast in toString URL: https://github.com/apache/spark/pull/27608#issuecomment-587318252 **[Test build #118607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118607/testReport)** for PR 27608 at commit [`4896fb5`](https://github.com/apache/spark/commit/4896fb56a4fa600fdba3e9c3600a9adb2effc792). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380492640 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. + ### Coalescing Post Shuffle Partition Num + This feature coalesces the post shuffle partitions based on the map output statistics when `spark.sql.adaptive.enabled` and `spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration properties are both enabled. There are four following sub-configurations in this optimization rule. + + Property NameDefaultMeaning + + spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled + true + + When true and spark.sql.adaptive.enabled is enabled, spark will reduce the post shuffle partitions number based on the map output statistics. + + + + spark.sql.adaptive.shuffle.minNumPostShufflePartitions + 1 + + The advisory minimum number of post-shuffle partitions used when spark.sql.adaptive.enabled and spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are both enabled. It is suggested to be almost 2~3x of the parallelism when doing benchmark. + + + + spark.sql.adaptive.shuffle.maxNumPostShufflePartitions + Int.MaxValue + + The advisory maximum number of post-shuffle partitions used in adaptive execution. This is used as the initial number of pre-shuffle partitions. By default it equals to spark.sql.shuffle.partitions. + + + + spark.sql.adaptive.shuffle.targetPostShuffleInputSize + 67108864 (64 MB) + + The target post-shuffle input size in bytes of a task when spark.sql.adaptive.enabled and spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are both enabled. + + + + + ### Optimize Local Shuffle Reader + This feature optimize the shuffle reader to local shuffle reader when converting the sort merge join to broadcast hash join in runtime and no additional shuffle introduced. It takes effect when `spark.sql.adaptive.enabled` and `spark.sql.adaptive.shuffle.localShuffleReader.enabled` configuration properties are both enabled. Review comment: add the performance data both three features. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587315297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23371/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587315290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587315297 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23371/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587315290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
JkSelf commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380492530 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. Review comment: updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587314887 **[Test build #118618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118618/testReport)** for PR 27616 at commit [`727f57f`](https://github.com/apache/spark/commit/727f57f1bfba53a486f87776646d160eb8061258). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27129: [SPARK-30427][SQL] Add config item for limiting partition number when calculating statistics through File System
HyukjinKwon commented on a change in pull request #27129: [SPARK-30427][SQL] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#discussion_r380486918 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -73,19 +74,30 @@ private[sql] class PruneHiveTablePartitions(session: SparkSession) private def updateTableMeta( tableMeta: CatalogTable, prunedPartitions: Seq[CatalogTablePartition]): CatalogTable = { -val sizeOfPartitions = prunedPartitions.map { partition => +val partitionsWithSize = prunedPartitions.map { partition => val rawDataSize = partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong) val totalSize = partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong) if (rawDataSize.isDefined && rawDataSize.get > 0) { -rawDataSize.get +(partition, rawDataSize.get) } else if (totalSize.isDefined && totalSize.get > 0L) { -totalSize.get +(partition, totalSize.get) } else { -0L +(partition, 0L) } } -if (sizeOfPartitions.forall(_ > 0)) { - val sizeInBytes = sizeOfPartitions.sum +if (partitionsWithSize.forall(_._2 > 0)) { + val sizeInBytes = partitionsWithSize.map(_._2).sum + tableMeta.copy(stats = Some(CatalogStatistics(sizeInBytes = BigInt(sizeInBytes +} else if (partitionsWithSize.count(_._2 == 0) <= conf.maxPartNumForStatsCalculateViaFS) { Review comment: @fuwhu, are you're proposing a configuration to automatically calculate the size? why don't you just manually run analyze comment to calculate the stats? It's weird to do this based on the number of partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587306310 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118606/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587306306 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587306316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23370/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587306311 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587306310 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118606/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
AmplabJenkins commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587306306 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587306316 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23370/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587306311 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
SparkQA removed a comment on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587239864 **[Test build #118606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118606/testReport)** for PR 27495 at commit [`25d0863`](https://github.com/apache/spark/commit/25d0863015e881819c67fdeb2e85c47dfb08f098). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587306011 **[Test build #118617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118617/testReport)** for PR 27563 at commit [`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
SparkQA commented on issue #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#issuecomment-587305720 **[Test build #118606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118606/testReport)** for PR 27495 at commit [`25d0863`](https://github.com/apache/spark/commit/25d0863015e881819c67fdeb2e85c47dfb08f098). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
cloud-fan commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587305011 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617#issuecomment-587304335 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23368/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#issuecomment-587304339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23369/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
AmplabJenkins removed a comment on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#issuecomment-587304332 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
AmplabJenkins removed a comment on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617#issuecomment-587304327 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617#issuecomment-587304327 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
AmplabJenkins commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617#issuecomment-587304335 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23368/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#issuecomment-587304339 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23369/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
AmplabJenkins commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#issuecomment-587304332 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
SparkQA commented on issue #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#issuecomment-587303954 **[Test build #118616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118616/testReport)** for PR 27592 at commit [`6d8eb75`](https://github.com/apache/spark/commit/6d8eb75f0c29962962f994d8f212fafae8577cfc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
SparkQA commented on issue #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617#issuecomment-587303945 **[Test build #118615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118615/testReport)** for PR 27617 at commit [`0b5711e`](https://github.com/apache/spark/commit/0b5711e0e332817f6cff28f79ccffeaca304). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cozos commented on issue #25899: [SPARK-29089][SQL] Parallelize blocking FileSystem calls in DataSource#checkAndGlobPathIfNecessary
cozos commented on issue #25899: [SPARK-29089][SQL] Parallelize blocking FileSystem calls in DataSource#checkAndGlobPathIfNecessary URL: https://github.com/apache/spark/pull/25899#issuecomment-587303126 Thank you everybody! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
beliefer commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#discussion_r380478627 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1797,11 +1797,11 @@ SIMPLE_COMMENT ; BRACKETED_EMPTY_COMMENT -: '/**/' -> channel(HIDDEN) +: '/*' BRACKETED_EMPTY_COMMENT? '*/' -> channel(HIDDEN) ; BRACKETED_COMMENT Review comment: If this problem could be solved in g4, things would be even simpler. Let me do some hard work and try. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587302311 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118614/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils
MaxGekk opened a new pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils URL: https://github.com/apache/spark/pull/27617 ### What changes were proposed in this pull request? 1. Move TimeZoneUTC and TimeZoneGMT to DateTimeTestUtils 2. Remove TimeZoneGMT 3. Use ZoneId.systemDefault() instead of defaultTimeZone().toZoneId 4. Alias SQLDate & SQLTimestamp to internal types of DateType and TimestampType ### Why are the changes needed? 1. TimeZoneUTC and TimeZoneGMT are moved to DateTimeTestUtils because they are used only in tests 2. TimeZoneGMT can be removed because it is equal to TimeZoneUTC 3. After the PR #27494, Spark expressions and DateTimeUtils functions switched to ZoneId instead of TimeZone completely. `defaultTimeZone()` with `TimeZone` as return type is not needed anymore. 4. SQLDate and SQLTimestamp types can be explicitly aliased to internal types of DateType and and TimestampType instead of declaring this in a comment. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By existing test suites This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] DavidToneian commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation.
DavidToneian commented on issue #27613: [SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation. URL: https://github.com/apache/spark/pull/27613#issuecomment-587302624 @HyukjinKwon: These are the only instances I found when I searched for a space followed by a colon (" :") in the output HTML docs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587302308 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587302311 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118614/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587298211 **[Test build #118614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)** for PR 27563 at commit [`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587302299 **[Test build #118614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)** for PR 27563 at commit [`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587302308 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder
beliefer commented on a change in pull request #27592: [SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder URL: https://github.com/apache/spark/pull/27592#discussion_r380477033 ## File path: sql/gen-sql-config-docs.py ## @@ -49,12 +50,13 @@ def generate_sql_configs_table(sql_configs, path): ```html -Property NameDefaultMeaning +Property NameDefaultMeaningVersion Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash
cloud-fan commented on issue #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash URL: https://github.com/apache/spark/pull/27601#issuecomment-587300412 thanks, merging to master/3.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash
cloud-fan closed pull request #27601: [SPARK-30847][SQL] Take productPrefix into account in MurmurHash3.productHash URL: https://github.com/apache/spark/pull/27601 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587298472 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587298476 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118613/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587298472 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587298476 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118613/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587298385 **[Test build #118613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)** for PR 27616 at commit [`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294511 **[Test build #118613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)** for PR 27616 at commit [`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
SparkQA commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587298211 **[Test build #118614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118614/testReport)** for PR 27563 at commit [`83fda3c`](https://github.com/apache/spark/commit/83fda3c90dd7cea7db4353be881965c8aa9e12ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380471798 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. + ### Coalescing Post Shuffle Partition Num + This feature coalesces the post shuffle partitions based on the map output statistics when `spark.sql.adaptive.enabled` and `spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration properties are both enabled. There are four following sub-configurations in this optimization rule. + + Property NameDefaultMeaning + + spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled + true + + When true and spark.sql.adaptive.enabled is enabled, spark will reduce the post shuffle partitions number based on the map output statistics. + + + + spark.sql.adaptive.shuffle.minNumPostShufflePartitions + 1 + + The advisory minimum number of post-shuffle partitions used when spark.sql.adaptive.enabled and spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are both enabled. It is suggested to be almost 2~3x of the parallelism when doing benchmark. + + + + spark.sql.adaptive.shuffle.maxNumPostShufflePartitions + Int.MaxValue + + The advisory maximum number of post-shuffle partitions used in adaptive execution. This is used as the initial number of pre-shuffle partitions. By default it equals to spark.sql.shuffle.partitions. + + + + spark.sql.adaptive.shuffle.targetPostShuffleInputSize + 67108864 (64 MB) + + The target post-shuffle input size in bytes of a task when spark.sql.adaptive.enabled and spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled are both enabled. + + + + + ### Optimize Local Shuffle Reader + This feature optimize the shuffle reader to local shuffle reader when converting the sort merge join to broadcast hash join in runtime and no additional shuffle introduced. It takes effect when `spark.sql.adaptive.enabled` and `spark.sql.adaptive.shuffle.localShuffleReader.enabled` configuration properties are both enabled. Review comment: ditto, users care more about the benefit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587296563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23367/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587296551 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins removed a comment on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587296563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23367/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
AmplabJenkins commented on issue #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#issuecomment-587296551 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite
zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite URL: https://github.com/apache/spark/pull/27555#discussion_r380469360 ## File path: mllib/src/main/scala/org/apache/spark/ml/stat/MultiClassSummarizer.scala ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import scala.collection.mutable + + +/** + * MultiClassSummarizer computes the number of distinct labels and corresponding counts, + * and validates the data to see if the labels used for k class multi-label classification + * are in the range of {0, 1, ..., k - 1} in an online fashion. + * + * Two MultilabelSummarizer can be merged together to have a statistical summary of the + * corresponding joint dataset. + */ +private[ml] class MultiClassSummarizer extends Serializable { + // The first element of value in distinctMap is the actually number of instances, + // and the second element of value is sum of the weights. + private val distinctMap = new mutable.HashMap[Int, (Long, Double)] + private var totalInvalidCnt: Long = 0L + + /** + * Add a new label into this MultilabelSummarizer, and update the distinct map. + * + * @param label The label for this data point. + * @param weight The weight of this instances. + * @return This MultilabelSummarizer + */ + def add(label: Double, weight: Double = 1.0): MultiClassSummarizer = { +require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0") + +if (weight == 0.0) return this + +if (label - label.toInt != 0.0 || label < 0) { + totalInvalidCnt += 1 + this +} +else { + val (counts: Long, weightSum: Double) = distinctMap.getOrElse(label.toInt, (0L, 0.0)) + distinctMap.put(label.toInt, (counts + 1L, weightSum + weight)) + this +} + } + + /** + * Merge another MultilabelSummarizer, and update the distinct map. + * (Note that it will merge the smaller distinct map into the larger one using in-place + * merging, so either `this` or `other` object will be modified and returned.) + * + * @param other The other MultilabelSummarizer to be merged. + * @return Merged MultilabelSummarizer object. + */ + def merge(other: MultiClassSummarizer): MultiClassSummarizer = { +val (largeMap, smallMap) = if (this.distinctMap.size > other.distinctMap.size) { + (this, other) +} else { + (other, this) +} +smallMap.distinctMap.foreach { + case (key, value) => +val (counts: Long, weightSum: Double) = largeMap.distinctMap.getOrElse(key, (0L, 0.0)) +largeMap.distinctMap.put(key, (counts + value._1, weightSum + value._2)) +} +largeMap.totalInvalidCnt += smallMap.totalInvalidCnt +largeMap + } + + /** @return The total invalid input counts. */ + def countInvalid: Long = totalInvalidCnt + + /** @return The number of distinct labels in the input dataset. */ + def numClasses: Int = if (distinctMap.isEmpty) 0 else distinctMap.keySet.max + 1 Review comment: nit: `distinctMap.keysIterator.max + 1` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380470558 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. + ### Coalescing Post Shuffle Partition Num + This feature coalesces the post shuffle partitions based on the map output statistics when `spark.sql.adaptive.enabled` and `spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration properties are both enabled. There are four following sub-configurations in this optimization rule. Review comment: shall we introduce the benefits of this feature? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy
Ngone51 commented on a change in pull request #27563: [SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy URL: https://github.com/apache/spark/pull/27563#discussion_r380470606 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1629,7 +1629,7 @@ object SQLConf { .createWithDefault(true) val PANDAS_ARROW_SAFE_TYPE_CONVERSION = -buildConf("spark.sql.execution.pandas.arrowSafeTypeConversion") +buildConf("spark.sql.execution.pandas.arrowSafeTypeConversion.enabled") Review comment: updated with `convertToArrowArraySafely`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite
zhengruifeng commented on a change in pull request #27555: [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite URL: https://github.com/apache/spark/pull/27555#discussion_r380468433 ## File path: mllib/src/main/scala/org/apache/spark/ml/stat/MultiClassSummarizer.scala ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.stat + +import scala.collection.mutable + + +/** + * MultiClassSummarizer computes the number of distinct labels and corresponding counts, + * and validates the data to see if the labels used for k class multi-label classification + * are in the range of {0, 1, ..., k - 1} in an online fashion. + * + * Two MultilabelSummarizer can be merged together to have a statistical summary of the + * corresponding joint dataset. + */ +private[ml] class MultiClassSummarizer extends Serializable { + // The first element of value in distinctMap is the actually number of instances, + // and the second element of value is sum of the weights. + private val distinctMap = new mutable.HashMap[Int, (Long, Double)] + private var totalInvalidCnt: Long = 0L + + /** + * Add a new label into this MultilabelSummarizer, and update the distinct map. + * + * @param label The label for this data point. + * @param weight The weight of this instances. + * @return This MultilabelSummarizer + */ + def add(label: Double, weight: Double = 1.0): MultiClassSummarizer = { +require(weight >= 0.0, s"instance weight, $weight has to be >= 0.0") + +if (weight == 0.0) return this + +if (label - label.toInt != 0.0 || label < 0) { Review comment: ```scala if (label - label.toInt != 0.0 || label < 0) { totalInvalidCnt += 1 } else { val (counts: Long, weightSum: Double) = distinctMap.getOrElse(label.toInt, (0L, 0.0)) distinctMap.put(label.toInt, (counts + 1L, weightSum + weight)) } this ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380469954 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. + ### Coalescing Post Shuffle Partition Num Review comment: Number This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380469743 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. Review comment: post-shuffle partition number This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380469743 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. Review comment: post-shuffle partitions number This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380469633 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. Review comment: `As of Spark 3.0, there are three major features ...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#discussion_r380469476 ## File path: docs/sql-performance-tuning.md ## @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names and a partition number is SELECT /*+ REPARTITION(3, c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t + +## Adaptive Query Execution +Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that make use of the runtime statistics to choose the most efficient query execution plan. AQE is disabled by default. Spark SQL can use the umbrella configuration of `spark.sql.adaptive.enabled` to control whether turn it on/off. There are three mainly feature in AQE, including coalescing post partition number, optimizing local shuffle reader and optimizing skewed join. Review comment: makes use This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments
cloud-fan commented on a change in pull request #27495: [SPARK-28880][SQL] Support ANSI nested bracketed comments URL: https://github.com/apache/spark/pull/27495#discussion_r380469230 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1797,11 +1797,11 @@ SIMPLE_COMMENT ; BRACKETED_EMPTY_COMMENT -: '/**/' -> channel(HIDDEN) +: '/*' BRACKETED_EMPTY_COMMENT? '*/' -> channel(HIDDEN) ; BRACKETED_COMMENT Review comment: > we still need to distinguish hint and comment syntax We can distinguish them better if they are both parser rules. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23366/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins removed a comment on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294911 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294911 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
AmplabJenkins commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294916 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23366/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
SparkQA commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587294511 **[Test build #118613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118613/testReport)** for PR 27616 at commit [`19a381b`](https://github.com/apache/spark/commit/19a381b2d5af5128e821d233f5c997730a6d8c36). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
JkSelf commented on issue #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616#issuecomment-587293946 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf opened a new pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution
JkSelf opened a new pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution URL: https://github.com/apache/spark/pull/27616 ### What changes were proposed in this pull request? This PR will add the user guide for AQE and the detailed configurations about the three mainly features in AQE. ### Why are the changes needed? Add the detailed configurations. ### Does this PR introduce any user-facing change? No ### How was this patch tested? only add doc no need ut. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org