[GitHub] [spark] SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
SparkQA commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572437346 **[Test build #116362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116362/testReport)** for PR 27129 at commit [`abeb326`](https://github.com/apache/spark/commit/abeb32672b18690d58fa34c099845c6a777333ca). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong
sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong URL: https://github.com/apache/spark/pull/27135#discussion_r364597689 ## File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ## @@ -288,10 +288,10 @@ private[ui] class StagePage(parent: StagesTab, store: AppStatusStore) extends We val executorOverhead = serializationTime + deserializationTime val executorRunTime = if (taskInfo.duration.isDefined) { - totalExecutionTime - executorOverhead - gettingResultTime + math.max(totalExecutionTime - executorOverhead - gettingResultTime - schedulerDelay, 0) Review comment: It is impossible. I write math.max(xxx, 0) here, just be consistent with the calculation of the executorComputingTimeProportion L297: ``` val executorComputingTimeProportion = math.max(100 - schedulerDelayProportion - shuffleReadTimeProportion - shuffleWriteTimeProportion - serializationTimeProportion - deserializationTimeProportion - gettingResultTimeProportion, 0) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong
sddyljsx commented on a change in pull request #27135: [SPARK-30458][Web UI] The Executor Computing Time in Time Line of Stage Page is Wrong URL: https://github.com/apache/spark/pull/27135#discussion_r364597689 ## File path: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ## @@ -288,10 +288,10 @@ private[ui] class StagePage(parent: StagesTab, store: AppStatusStore) extends We val executorOverhead = serializationTime + deserializationTime val executorRunTime = if (taskInfo.duration.isDefined) { - totalExecutionTime - executorOverhead - gettingResultTime + math.max(totalExecutionTime - executorOverhead - gettingResultTime - schedulerDelay, 0) Review comment: It is impossible. I write math.max(xxx, 0) here, just be consistent with the calculation of the executorComputingTimeProportion L297: val executorComputingTimeProportion = math.max(100 - schedulerDelayProportion - shuffleReadTimeProportion - shuffleWriteTimeProportion - serializationTimeProportion - deserializationTimeProportion - gettingResultTimeProportion, 0) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364597264 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala ## @@ -35,21 +35,23 @@ case class DescribeNamespaceExec( namespace: Seq[String], isExtended: Boolean) extends V2CommandExec { private val encoder = RowEncoder(StructType.fromAttributes(output)).resolveAndBind() + import SupportsNamespaces._ override protected def run(): Seq[InternalRow] = { val rows = new ArrayBuffer[InternalRow]() val ns = namespace.toArray val metadata = catalog.loadNamespaceMetadata(ns) rows += toCatalystRow("Namespace Name", ns.last) -rows += toCatalystRow("Description", metadata.get(SupportsNamespaces.PROP_COMMENT)) -rows += toCatalystRow("Location", metadata.get(SupportsNamespaces.PROP_LOCATION)) +rows += toCatalystRow("Description", metadata.get(PROP_COMMENT)) +rows += toCatalystRow("Location", metadata.get(PROP_LOCATION)) +rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, "")) Review comment: we should do the same for other fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
yaooqinn commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364596755 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -884,7 +885,8 @@ class DataSourceV2SQLSuite .isEmpty, s"$key is a reserved namespace property and ignored") val meta = catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest")) - assert(meta.get(key) === null, "reserved properties should not have side effects") + assert(!Option(meta.get(key)).exists(_.contains("foo")), Review comment: this can be fixed if we revert the change in Create syntax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572435529 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116361/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364596563 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala ## @@ -35,21 +35,23 @@ case class DescribeNamespaceExec( namespace: Seq[String], isExtended: Boolean) extends V2CommandExec { private val encoder = RowEncoder(StructType.fromAttributes(output)).resolveAndBind() + import SupportsNamespaces._ override protected def run(): Seq[InternalRow] = { val rows = new ArrayBuffer[InternalRow]() val ns = namespace.toArray val metadata = catalog.loadNamespaceMetadata(ns) rows += toCatalystRow("Namespace Name", ns.last) -rows += toCatalystRow("Description", metadata.get(SupportsNamespaces.PROP_COMMENT)) -rows += toCatalystRow("Location", metadata.get(SupportsNamespaces.PROP_LOCATION)) +rows += toCatalystRow("Description", metadata.get(PROP_COMMENT)) +rows += toCatalystRow("Location", metadata.get(PROP_LOCATION)) +rows += toCatalystRow("Owner Name", metadata.getOrDefault(PROP_OWNER_NAME, "")) Review comment: we should skip printing owner if the properties don't exist This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432308 **[Test build #116361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)** for PR 26813 at commit [`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572435523 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572435529 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116361/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572435523 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364596224 ## File path: sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala ## @@ -884,7 +885,8 @@ class DataSourceV2SQLSuite .isEmpty, s"$key is a reserved namespace property and ignored") val meta = catalog("testcat").asNamespaceCatalog.loadNamespaceMetadata(Array("reservedTest")) - assert(meta.get(key) === null, "reserved properties should not have side effects") + assert(!Option(meta.get(key)).exists(_.contains("foo")), Review comment: what's wrong with `meta.get(key) === null`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572435500 **[Test build #116361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)** for PR 26813 at commit [`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572435187 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572435192 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21153/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins removed a comment on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572435187 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System
AmplabJenkins commented on issue #27129: [SPARK-30427] Add config item for limiting partition number when calculating statistics through File System URL: https://github.com/apache/spark/pull/27129#issuecomment-572435192 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21153/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364595504 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala ## @@ -35,11 +36,14 @@ case class CreateNamespaceExec( extends V2CommandExec { override protected def run(): Seq[InternalRow] = { import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ +import org.apache.spark.sql.connector.catalog.SupportsNamespaces._ val ns = namespace.toArray if (!catalog.namespaceExists(ns)) { try { -catalog.createNamespace(ns, properties.asJava) +val ownership = + Map(PROP_OWNER_NAME -> Utils.getCurrentUserName(), PROP_OWNER_TYPE -> "USER") Review comment: can we do it later? It's unclear to me that what we should do when CREATE TABLE without owner specified. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364594708 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2532,6 +2532,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case (PROP_COMMENT, _) => throw new ParseException(s"$PROP_COMMENT is a reserved namespace property, please use" + s" the COMMENT clause to specify it.", ctx) +case (ownership, _) if ownership == PROP_OWNER_NAME || ownership == PROP_OWNER_TYPE => + throw new ParseException(s"$ownership is a reserved namespace property , please use" + Review comment: is `ownership` guaranteed to be upper case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364594708 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala ## @@ -2532,6 +2532,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging case (PROP_COMMENT, _) => throw new ParseException(s"$PROP_COMMENT is a reserved namespace property, please use" + s" the COMMENT clause to specify it.", ctx) +case (ownership, _) if ownership == PROP_OWNER_NAME || ownership == PROP_OWNER_TYPE => + throw new ParseException(s"$ownership is a reserved namespace property , please use" + Review comment: is `ownership` guaranteed to be upper case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364594386 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java ## @@ -64,9 +64,16 @@ String PROP_OWNER_TYPE = "ownerType"; /** - * The list of reserved namespace properties. + * The list of immutable namespace properties, which can not be removed or changed directly by + * the syntax: + * {{ + * ALTER (DATABASE|SCHEMA|NAMESPACE) SET DBPROPERTIES(...) Review comment: let's use the recommended syntax in examples ``` ALTER NAMESPACE ... SET PROPERTIES ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364594207 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java ## @@ -64,9 +64,16 @@ String PROP_OWNER_TYPE = "ownerType"; /** - * The list of reserved namespace properties. + * The list of immutable namespace properties, which can not be removed or changed directly by Review comment: immutable -> reserved This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
cloud-fan commented on a change in pull request #26775: [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax URL: https://github.com/apache/spark/pull/26775#discussion_r364593906 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1347,6 +1349,7 @@ nonReserved | OVERLAPS | OVERLAY | OVERWRITE +| OWNER Review comment: this should also be put in `ansiNonReserved` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432869 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21152/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432882 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21152/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432869 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation
HeartSaVioR edited a comment on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation URL: https://github.com/apache/spark/pull/26889#issuecomment-572432167 Thanks for the contribution. @southernriver 1. Could you resolve the conflict? 2. Could we add UT for verifying the patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation
HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation URL: https://github.com/apache/spark/pull/26889#issuecomment-572432167 Thanks for the contribution. Could we add UT for verifying the patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation
HeartSaVioR commented on issue #26889: [SPARK-30261][SQL] Should not change owner of hive table for some commands like 'alter' operation URL: https://github.com/apache/spark/pull/26889#issuecomment-572432353 cc. @cloud-fan @gatorsmile @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572432308 **[Test build #116361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116361/testReport)** for PR 26813 at commit [`a8f5e03`](https://github.com/apache/spark/commit/a8f5e038e9c549f085cfe3ba5d89dc89ec64d40f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572429704 **[Test build #116360 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116360/testReport)** for PR 27136 at commit [`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on issue #26847: [SPARK-30214][SQL] A new framework to resolve v2 commands
gatorsmile commented on issue #26847: [SPARK-30214][SQL] A new framework to resolve v2 commands URL: https://github.com/apache/spark/pull/26847#issuecomment-572428660 @imback82 COMMENT ON is a good example, but the code changes for implementing COMMENT ON are not small. It might make the code review harder. I think we can follow what @rdblue and @brkyvz suggested above to separate the actual changes from COMMENT ON? How about reverting it and submitted two separate PRs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572427581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21150/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572427581 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21150/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572427046 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116358/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572427584 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21151/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572427575 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572427577 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572427584 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21151/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572427575 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572427577 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572427038 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r364587799 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, AttributeSet, Expression, PredicateHelper, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.CommandUtils + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with PredicateHelper { + /** + * Extract the partition filters from the filters on the table. + */ + private def extractPartitionPruningFilters( + filters: Seq[Expression], + relation: HiveTableRelation): Seq[Expression] = { +val normalizedFilters = filters.map { e => + e transform { +case a: AttributeReference => + a.withName(relation.output.find(_.semanticEquals(a)).get.name) + } +} +val partitionSet = AttributeSet(relation.partitionCols) +normalizedFilters.filter { predicate => + !predicate.references.isEmpty && predicate.references.subsetOf(partitionSet) +} + } + + /** + * Prune the hive table using filters on the partitions of the table, + * and also update the statistics of the table. + */ + private def prunedHiveTableWithStats( + relation: HiveTableRelation, + partitionFilters: Seq[Expression]): HiveTableRelation = { +val conf = session.sessionState.conf +val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter( + relation.tableMeta.identifier, + partitionFilters) +val sizeInBytes = try { + val sizeOfPartitions = prunedPartitions.map { partition => +val rawDataSize = partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong) +val totalSize = partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong) +if (rawDataSize.isDefined && rawDataSize.get > 0) { + rawDataSize.get +} else if (totalSize.isDefined && totalSize.get > 0L) { + totalSize.get +} else { + CommandUtils.calculateLocationSize( +session.sessionState, relation.tableMeta.identifier, partition.storage.locationUri) +} + }.sum + // If size of partitions is zero fall back to the default size. + if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions +} catch { + case e: IOException => +logWarning("Failed to get table size from HDFS.", e) +conf.defaultSizeInBytes +} +val newTableMeta = + if (relation.tableMeta.stats.isDefined) { +relation.tableMeta.copy( + stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = BigInt(sizeInBytes Review comment: could be inconsistent, eg. rowCount and sizeInBytes may be inconsistent after this rule. Restored to creating new CatalogStatistics instance. But by doing so, some statistics may be lost which should not impact accuracy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r364587799 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, AttributeSet, Expression, PredicateHelper, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.CommandUtils + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with PredicateHelper { + /** + * Extract the partition filters from the filters on the table. + */ + private def extractPartitionPruningFilters( + filters: Seq[Expression], + relation: HiveTableRelation): Seq[Expression] = { +val normalizedFilters = filters.map { e => + e transform { +case a: AttributeReference => + a.withName(relation.output.find(_.semanticEquals(a)).get.name) + } +} +val partitionSet = AttributeSet(relation.partitionCols) +normalizedFilters.filter { predicate => + !predicate.references.isEmpty && predicate.references.subsetOf(partitionSet) +} + } + + /** + * Prune the hive table using filters on the partitions of the table, + * and also update the statistics of the table. + */ + private def prunedHiveTableWithStats( + relation: HiveTableRelation, + partitionFilters: Seq[Expression]): HiveTableRelation = { +val conf = session.sessionState.conf +val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter( + relation.tableMeta.identifier, + partitionFilters) +val sizeInBytes = try { + val sizeOfPartitions = prunedPartitions.map { partition => +val rawDataSize = partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong) +val totalSize = partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong) +if (rawDataSize.isDefined && rawDataSize.get > 0) { + rawDataSize.get +} else if (totalSize.isDefined && totalSize.get > 0L) { + totalSize.get +} else { + CommandUtils.calculateLocationSize( +session.sessionState, relation.tableMeta.identifier, partition.storage.locationUri) +} + }.sum + // If size of partitions is zero fall back to the default size. + if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions +} catch { + case e: IOException => +logWarning("Failed to get table size from HDFS.", e) +conf.defaultSizeInBytes +} +val newTableMeta = + if (relation.tableMeta.stats.isDefined) { +relation.tableMeta.copy( + stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = BigInt(sizeInBytes Review comment: could be inconsistent, eg. rowCount and sizeInBytes may be inconsistent after this rule. So restored to creating new CatalogStatistics instance. But by doing so, some statistics may be lost which should not impact accuracy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services -
[GitHub] [spark] SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424510 **[Test build #116358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)** for PR 26813 at commit [`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572427046 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116358/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572427015 **[Test build #116358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)** for PR 26813 at commit [`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
cloud-fan commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572427078 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r364587799 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, AttributeSet, Expression, PredicateHelper, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.CommandUtils + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with PredicateHelper { + /** + * Extract the partition filters from the filters on the table. + */ + private def extractPartitionPruningFilters( + filters: Seq[Expression], + relation: HiveTableRelation): Seq[Expression] = { +val normalizedFilters = filters.map { e => + e transform { +case a: AttributeReference => + a.withName(relation.output.find(_.semanticEquals(a)).get.name) + } +} +val partitionSet = AttributeSet(relation.partitionCols) +normalizedFilters.filter { predicate => + !predicate.references.isEmpty && predicate.references.subsetOf(partitionSet) +} + } + + /** + * Prune the hive table using filters on the partitions of the table, + * and also update the statistics of the table. + */ + private def prunedHiveTableWithStats( + relation: HiveTableRelation, + partitionFilters: Seq[Expression]): HiveTableRelation = { +val conf = session.sessionState.conf +val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter( + relation.tableMeta.identifier, + partitionFilters) +val sizeInBytes = try { + val sizeOfPartitions = prunedPartitions.map { partition => +val rawDataSize = partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong) +val totalSize = partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong) +if (rawDataSize.isDefined && rawDataSize.get > 0) { + rawDataSize.get +} else if (totalSize.isDefined && totalSize.get > 0L) { + totalSize.get +} else { + CommandUtils.calculateLocationSize( +session.sessionState, relation.tableMeta.identifier, partition.storage.locationUri) +} + }.sum + // If size of partitions is zero fall back to the default size. + if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions +} catch { + case e: IOException => +logWarning("Failed to get table size from HDFS.", e) +conf.defaultSizeInBytes +} +val newTableMeta = + if (relation.tableMeta.stats.isDefined) { +relation.tableMeta.copy( + stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = BigInt(sizeInBytes Review comment: could be inconsistent, eg. rowCount and sizeInBytes is inconsistent. Restored to creating new CatalogStatistics instance. But by doing so, some statistics may be lost which should not impact accuracy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail:
[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#discussion_r364588106 ## File path: sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala ## @@ -107,7 +111,8 @@ class DataFrameCallbackSuite extends QueryTest with SharedSparkSession { val df = Seq(1 -> "a").toDF("i", "j").groupBy("i").count() df.collect() -// Wait for the first `collect` to be caught by our listener. Otherwise the next `collect` will +// Wait for the first `collect` to be caught by our listener. +// Otherwise the next `collect` will Review comment: unnecessary change This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572427096 **[Test build #116359 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116359/testReport)** for PR 26805 at commit [`0218202`](https://github.com/apache/spark/commit/02182020fcfa8d6867d2bedf678b4fdb71ac04ff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572427038 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#discussion_r364580910 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala ## @@ -308,4 +308,11 @@ class EliminateSortsSuite extends PlanTest { val correctAnswer = PushDownOptimizer.execute(noOrderByPlan.analyze) comparePlans(optimized, correctAnswer) } + + test("should not remove orderBy in sortBy clause") { +val plan = testRelation.orderBy('a.asc).sortBy('b.desc) +val optimized = Optimize.execute(plan.analyze) +val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze +comparePlans(optimized, correctAnswer) Review comment: New UT added for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#discussion_r364580898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -980,15 +980,25 @@ object EliminateSorts extends Rule[LogicalPlan] { if (newOrders.isEmpty) child else s.copy(order = newOrders) case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) => child -case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child)) +case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(s, child)) case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) => j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight)) case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) => g.copy(child = recursiveRemoveSort(originChild)) } + private def recursiveRemoveSort(parent: Sort, plan: LogicalPlan): LogicalPlan = plan match { +case s @ Sort(_, _, child) => + if (parent.global == s.global) { Review comment: New UT added for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL][WIP] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on a change in pull request #26805: [SPARK-15616][SQL][WIP] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#discussion_r364587799 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.io.IOException + +import org.apache.hadoop.hive.common.StatsSetupConst + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.catalyst.catalog.{CatalogStatistics, HiveTableRelation} +import org.apache.spark.sql.catalyst.expressions.{And, AttributeReference, AttributeSet, Expression, PredicateHelper, SubqueryExpression} +import org.apache.spark.sql.catalyst.planning.PhysicalOperation +import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.command.CommandUtils + +/** + * TODO: merge this with PruneFileSourcePartitions after we completely make hive as a data source. + */ +private[sql] class PruneHiveTablePartitions(session: SparkSession) + extends Rule[LogicalPlan] with PredicateHelper { + /** + * Extract the partition filters from the filters on the table. + */ + private def extractPartitionPruningFilters( + filters: Seq[Expression], + relation: HiveTableRelation): Seq[Expression] = { +val normalizedFilters = filters.map { e => + e transform { +case a: AttributeReference => + a.withName(relation.output.find(_.semanticEquals(a)).get.name) + } +} +val partitionSet = AttributeSet(relation.partitionCols) +normalizedFilters.filter { predicate => + !predicate.references.isEmpty && predicate.references.subsetOf(partitionSet) +} + } + + /** + * Prune the hive table using filters on the partitions of the table, + * and also update the statistics of the table. + */ + private def prunedHiveTableWithStats( + relation: HiveTableRelation, + partitionFilters: Seq[Expression]): HiveTableRelation = { +val conf = session.sessionState.conf +val prunedPartitions = session.sessionState.catalog.listPartitionsByFilter( + relation.tableMeta.identifier, + partitionFilters) +val sizeInBytes = try { + val sizeOfPartitions = prunedPartitions.map { partition => +val rawDataSize = partition.parameters.get(StatsSetupConst.RAW_DATA_SIZE).map(_.toLong) +val totalSize = partition.parameters.get(StatsSetupConst.TOTAL_SIZE).map(_.toLong) +if (rawDataSize.isDefined && rawDataSize.get > 0) { + rawDataSize.get +} else if (totalSize.isDefined && totalSize.get > 0L) { + totalSize.get +} else { + CommandUtils.calculateLocationSize( +session.sessionState, relation.tableMeta.identifier, partition.storage.locationUri) +} + }.sum + // If size of partitions is zero fall back to the default size. + if (sizeOfPartitions == 0L) conf.defaultSizeInBytes else sizeOfPartitions +} catch { + case e: IOException => +logWarning("Failed to get table size from HDFS.", e) +conf.defaultSizeInBytes +} +val newTableMeta = + if (relation.tableMeta.stats.isDefined) { +relation.tableMeta.copy( + stats = Some(relation.tableMeta.stats.get.copy(sizeInBytes = BigInt(sizeInBytes Review comment: could be inconsistent, eg. rowCount and sizeInBytes is inconsistent. Restored to new CatalogStatistics instance. But by doing so, some statistics may be lost which should not impact accuracy. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail:
[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#discussion_r364587953 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -602,9 +602,14 @@ class AdaptiveQueryExecSuite test("SPARK-30403: AQE should handle InSubquery") { withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") { + // If the subquery does not contain the shuffle node, + // it may get the "SubqueryAdaptiveNotSupportedException" + // and the main sql will also not be inserted the "AdaptiveSparkPlanExec" node. + // Here change the subquery to contain the exchange node. Review comment: we can revert the changes in this file now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fuwhu commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions
fuwhu commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions URL: https://github.com/apache/spark/pull/26805#issuecomment-572426719 Did some refining, @cloud-fan please help review again. thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572425310 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116338/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#discussion_r364587258 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala ## @@ -39,11 +41,27 @@ case class InsertAdaptiveSparkPlan( private val conf = adaptiveExecutionContext.session.sessionState.conf + def containShuffle(plan: SparkPlan): Boolean = { +plan.find { + case plan: SparkPlan => !plan.requiredChildDistribution.forall(_ == UnspecifiedDistribution) + case _: Exchange => true + case _ => false +}.isDefined + } + + def containSubQuery(plan: SparkPlan): Boolean = { +plan.find(_.expressions.exists(_.find { + case _: SubqueryExpression => true + case _ => false +}.isDefined)).isDefined + } + override def apply(plan: SparkPlan): SparkPlan = applyInternal(plan, false) private def applyInternal(plan: SparkPlan, isSubquery: Boolean): SparkPlan = plan match { case _: ExecutedCommandExec => plan -case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan) => +case _ if conf.adaptiveExecutionEnabled && supportAdaptive(plan) + && (containSubQuery(plan) || containShuffle(plan) || isSubquery) => Review comment: nit: we should put `isSubquery` at the beginning to save the checking time for subqueries. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572425299 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572425310 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116338/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
AmplabJenkins commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572425299 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424911 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572424945 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
AmplabJenkins removed a comment on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572424953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21148/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
SparkQA removed a comment on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572364940 **[Test build #116338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116338/testReport)** for PR 27136 at commit [`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins removed a comment on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21149/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2
SparkQA commented on issue #27136: [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2 URL: https://github.com/apache/spark/pull/27136#issuecomment-572424746 **[Test build #116338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116338/testReport)** for PR 27136 at commit [`afef6ad`](https://github.com/apache/spark/commit/afef6ad4119791eae2f889a478f1064752cac840). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424911 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572424945 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
AmplabJenkins commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572424953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21148/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
AmplabJenkins commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21149/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
SparkQA commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572424490 **[Test build #116357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116357/testReport)** for PR 27110 at commit [`0b047b5`](https://github.com/apache/spark/commit/0b047b5ac72a01ca0c120fb3b31321ddcd293a69). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
SparkQA commented on issue #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#issuecomment-572424510 **[Test build #116358 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116358/testReport)** for PR 26813 at commit [`c72c20d`](https://github.com/apache/spark/commit/c72c20d296a730fd2cd8f65c157c610017536b1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE
cloud-fan commented on a change in pull request #26813: [SPARK-30188][SQL][WIP] Resolve the failed unit tests when enable AQE URL: https://github.com/apache/spark/pull/26813#discussion_r364586169 ## File path: mllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala ## @@ -41,6 +41,7 @@ import org.apache.spark.scheduler.{SparkListener, SparkListenerStageCompleted} import org.apache.spark.sql.{DataFrame, Encoder, Row, SparkSession} import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.internal.SQLConf Review comment: unnecessary import This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE
cloud-fan commented on issue #27110: [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD COLUMN and ALTER TABLE URL: https://github.com/apache/spark/pull/27110#issuecomment-572423525 I checked and we don't have any document for ALTER COLUMN. Shall we add it in a different PR? It's more than just the new syntax added here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572422536 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572422536 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
AmplabJenkins commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572422540 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21147/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
AmplabJenkins removed a comment on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572422540 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21147/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode
SparkQA commented on issue #26881: [SPARK-30252][SQL] Disallow negative scale of Decimal under ansi mode URL: https://github.com/apache/spark/pull/26881#issuecomment-572422106 **[Test build #116356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116356/testReport)** for PR 26881 at commit [`9d41ea2`](https://github.com/apache/spark/commit/9d41ea22b2ec6eeb296c5e7f456cb95bba3e7818). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-572420194 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-572420203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21146/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-572420194 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-572420203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21146/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-572419843 **[Test build #116355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116355/testReport)** for PR 26973 at commit [`17e742f`](https://github.com/apache/spark/commit/17e742f59ecccd5ad76da69676c8faea8a0494ad). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#issuecomment-572417888 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
AmplabJenkins removed a comment on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#issuecomment-572417898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#issuecomment-572417888 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
AmplabJenkins commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#issuecomment-572417898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21145/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#discussion_r364580910 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala ## @@ -308,4 +308,11 @@ class EliminateSortsSuite extends PlanTest { val correctAnswer = PushDownOptimizer.execute(noOrderByPlan.analyze) comparePlans(optimized, correctAnswer) } + + test("should not remove orderBy in sortBy clause") { +val plan = testRelation.orderBy('a.asc).sortBy('b.desc) +val optimized = Optimize.execute(plan.analyze) +val correctAnswer = testRelation.orderBy('a.asc).sortBy('b.desc).analyze +comparePlans(optimized, correctAnswer) Review comment: I add new UT for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
apapi commented on a change in pull request #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#discussion_r364580898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -980,15 +980,25 @@ object EliminateSorts extends Rule[LogicalPlan] { if (newOrders.isEmpty) child else s.copy(order = newOrders) case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) => child -case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child)) +case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(s, child)) case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) => j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight)) case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) => g.copy(child = recursiveRemoveSort(originChild)) } + private def recursiveRemoveSort(parent: Sort, plan: LogicalPlan): LogicalPlan = plan match { +case s @ Sort(_, _, child) => + if (parent.global == s.global) { Review comment: I add new UT for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer
SparkQA commented on issue #27077: [SPARK-30408][SQL] Should not remove orderBy in sortBy clause in Optimizer URL: https://github.com/apache/spark/pull/27077#issuecomment-572417353 **[Test build #116354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116354/testReport)** for PR 27077 at commit [`9d8b6ce`](https://github.com/apache/spark/commit/9d8b6cef5b5de527d6a3ffcf9d8de0f8ff99c407). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nchammas commented on issue #26936: [SPARK-30296][SQL] Add Dataset diffing feature
nchammas commented on issue #26936: [SPARK-30296][SQL] Add Dataset diffing feature URL: https://github.com/apache/spark/pull/26936#issuecomment-572417229 Yes, my bad. An equality check can be built on the same components as diffing, but they are indeed different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r364580011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) } else { - a + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +val phantomId = i + 1 Review comment: The illusionary mechanism may result in multiple distinct aggregations uses different column, so we still need to call `rewrite`. (e.g., count(distinct phantom1-a) and count(distinct a phantom2-a)) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r364580011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) } else { - a + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +val phantomId = i + 1 Review comment: The illusionary mechanism may result in multiple distinct aggregations uses different column, so we still need to call `rewrite`. (e.g., `count(distinct phantom1-a) and count(distinct a phantom2-a)`) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression
beliefer commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression allows the use of the FILTER clause URL: https://github.com/apache/spark/pull/27058#discussion_r364544828 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -317,7 +363,74 @@ object RewriteDistinctAggregates extends Rule[LogicalPlan] { } Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate) } else { - a + val (distinctAggExpressions, regularAggExpressions) = aggExpressions.partition(_.isDistinct) + if (distinctAggExpressions.exists(_.filter.isDefined)) { +val regularAggExprs = regularAggExpressions.filter(e => e.children.exists(!_.foldable)) +val regularFunChildren = regularAggExprs + .flatMap(_.aggregateFunction.children.filter(!_.foldable)) +val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes) +val regularAggChildren = (regularFunChildren ++ regularFilterAttrs).distinct +val regularAggChildAttrMap = regularAggChildren.map(expressionAttributePair) +val regularAggChildAttrLookup = regularAggChildAttrMap.toMap +val regularOperatorMap = regularAggExprs.map { + case ae @ AggregateExpression(af, _, _, filter, _) => +val newChildren = af.children.map(c => regularAggChildAttrLookup.getOrElse(c, c)) +val raf = af.withNewChildren(newChildren).asInstanceOf[AggregateFunction] +val filterOpt = filter.map(_.transform { + case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a) +}) +val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt) +(ae, aggExpr) +} +val distinctAggExprs = distinctAggExpressions.filter(e => e.children.exists(!_.foldable)) +val rewriteDistinctOperatorMap = distinctAggExprs.zipWithIndex.map { + case (ae @ AggregateExpression(af, _, _, filter, _), i) => +val phantomId = i + 1 Review comment: // Why do we need to construct the phantom id ? // First, In order to reduce costs, it is better to handle the filter clause locally. // e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate expression // If(id > 1) 'a else null first, and use the result as output. // Second, If more than one DISTINCT aggregate expression uses the same column, // We need to construct the phantom attributes so as the output not lost. // e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) will output // attribute 'a and attribute 'phantom1-a instead of two 'a. // Note: We just need to illusion the expression with filter clause. // The illusionary mechanism may result in multiple distinct aggregations uses // different column, so we still need to call `rewrite`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-572415419 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
AmplabJenkins removed a comment on issue #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#issuecomment-572415425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21144/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org