[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr
viirya edited a comment on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843769636 Please take another look. I found corner case and added a test case. Thanks. cc @cloud-fan @maropu @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843769636 Please take another look. I found corner case and added a test case. cc @cloud-fan @maropu @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
AmplabJenkins removed a comment on pull request #32570: URL: https://github.com/apache/spark/pull/32570#issuecomment-843765326 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43222/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr
AmplabJenkins removed a comment on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843765325 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43221/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr
SparkQA commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843766768 **[Test build #138703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138703/testReport)** for PR 32586 at commit [`f777855`](https://github.com/apache/spark/commit/f7778555f6b12264108d9f25a860bc8ff467ba59). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
AmplabJenkins commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843765325 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43221/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
AmplabJenkins commented on pull request #32570: URL: https://github.com/apache/spark/pull/32570#issuecomment-843765326 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43222/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843764793 Hmm, I found corner case that LinkedHashMap doesn't work here. Going to update and adding test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
SparkQA commented on pull request #32587: URL: https://github.com/apache/spark/pull/32587#issuecomment-843764594 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43223/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
wangyum commented on a change in pull request #32563: URL: https://github.com/apache/spark/pull/32563#discussion_r634924338 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala ## @@ -580,41 +587,52 @@ case class RenameTable( */ case class ShowTables( namespace: LogicalPlan, -pattern: Option[String], -override val output: Seq[Attribute] = ShowTables.getOutputAttrs) extends UnaryCommand { +pattern: Option[String]) extends UnaryCommand { + override val output: Seq[Attribute] = { +val tableName = AttributeReference("tableName", StringType, nullable = false)() +val isTemporary = AttributeReference("isTemporary", BooleanType, nullable = false)() +if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) { + Seq(AttributeReference("database", StringType, nullable = false)(), tableName, isTemporary) +} else { + Seq(AttributeReference("namespace", StringType, nullable = false)(), tableName, isTemporary) +} + } + override def child: LogicalPlan = namespace override protected def withNewChildInternal(newChild: LogicalPlan): ShowTables = copy(namespace = newChild) } -object ShowTables { - def getOutputAttrs: Seq[Attribute] = Seq( -AttributeReference("namespace", StringType, nullable = false)(), -AttributeReference("tableName", StringType, nullable = false)(), -AttributeReference("isTemporary", BooleanType, nullable = false)()) -} - /** * The logical plan of the SHOW TABLE EXTENDED command. */ case class ShowTableExtended( namespace: LogicalPlan, pattern: String, -partitionSpec: Option[PartitionSpec], -override val output: Seq[Attribute] = ShowTableExtended.getOutputAttrs) extends UnaryCommand { +partitionSpec: Option[PartitionSpec]) extends UnaryCommand { + override val output: Seq[Attribute] = { +val tableName = AttributeReference("tableName", StringType, nullable = false)() +val isTemporary = AttributeReference("isTemporary", BooleanType, nullable = false)() +if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) { + Seq( +AttributeReference("database", StringType, nullable = false)(), +tableName, +isTemporary, +AttributeReference("information", StringType, nullable = false)()) +} else { + Seq( +AttributeReference("namespace", StringType, nullable = false)(), +tableName, +isTemporary, +AttributeReference("information", MapType(StringType, StringType), nullable = false)()) +} Review comment: Move the `spark.sql.legacy.keepCommandOutputSchema` logic from `ResolveSessionCatalog` to `v2Commands`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method
LuciferYang commented on pull request #32536: URL: https://github.com/apache/spark/pull/32536#issuecomment-843759422 thx all ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
cloud-fan commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-843759217 I see, then option 1 is not valid. Can you open a new PR to try option 2? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
SparkQA commented on pull request #32570: URL: https://github.com/apache/spark/pull/32570#issuecomment-843753601 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43222/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
SparkQA commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843751331 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43221/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
SparkQA commented on pull request #32587: URL: https://github.com/apache/spark/pull/32587#issuecomment-843746961 **[Test build #138702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138702/testReport)** for PR 32587 at commit [`3aa2124`](https://github.com/apache/spark/commit/3aa21241102526d9a8245047158f5eeb632f6faf). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] linhongliu-db opened a new pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF
linhongliu-db opened a new pull request #32587: URL: https://github.com/apache/spark/pull/32587 ### What changes were proposed in this pull request? Add the language, such as "scala", "python", "java", "hive", "built-in" to the `ExpressionInfo` for UDF. ### Why are the changes needed? Make the `ExpressionInfo` of UDF more meaningful ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing and newly added UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
AmplabJenkins removed a comment on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843745981 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138695/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
AmplabJenkins commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843745981 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138695/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics
AmplabJenkins removed a comment on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-843742972 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
AmplabJenkins removed a comment on pull request #32563: URL: https://github.com/apache/spark/pull/32563#issuecomment-843742973 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
SparkQA removed a comment on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843663224 **[Test build #138695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)** for PR 32585 at commit [`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
SparkQA commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843745233 **[Test build #138695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)** for PR 32585 at commit [`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya removed a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya removed a comment on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300 Sorry I misread the code, looks like we add parent expression first into the map and traverse to its children expressions. Let me put it in draft first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843743329 > The fix looks fine. Is it difficult to add some tests for that case? I don't come out a test that fails before but succeeds after this. I think the retrieving order is okay during my test. But it is not guaranteed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics
AmplabJenkins commented on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-843742972 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
AmplabJenkins commented on pull request #32563: URL: https://github.com/apache/spark/pull/32563#issuecomment-843742973 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
SparkQA commented on pull request #32570: URL: https://github.com/apache/spark/pull/32570#issuecomment-843742102 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43222/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843741600 I figured out this change makes sense. But the description is not correct. I will update it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya edited a comment on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300 Sorry I misread the code, looks like we add parent expression first into the map and traverse to its children expressions. Let me put it in draft first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
SparkQA commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843740202 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43221/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya edited a comment on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300 Sorry I misread the code, looks like we add parent expression first into the map and traverse to its children expressions. Let me put it in draft first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300 Let me put it in draft first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics
SparkQA commented on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-843732045 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
SparkQA commented on pull request #32563: URL: https://github.com/apache/spark/pull/32563#issuecomment-843731519 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
maropu commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843730003 The fix looks fine. Is it difficult to add some tests for that case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method
maropu commented on pull request #32536: URL: https://github.com/apache/spark/pull/32536#issuecomment-843728916 Thank you, @LuciferYang . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method
maropu closed pull request #32536: URL: https://github.com/apache/spark/pull/32536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32584: Test pandas nondeterministic return values
AmplabJenkins removed a comment on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843724112 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43217/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
SparkQA commented on pull request #32570: URL: https://github.com/apache/spark/pull/32570#issuecomment-843725725 **[Test build #138701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138701/testReport)** for PR 32570 at commit [`4574440`](https://github.com/apache/spark/commit/4574440d134cf3ce0595b2afd12b47729749be97). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation
sarutak commented on a change in pull request #32570: URL: https://github.com/apache/spark/pull/32570#discussion_r634894308 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala ## @@ -120,20 +122,22 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat scanExec Review comment: Ah, I see, thanks. I've updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate
cfmcgrady commented on a change in pull request #32488: URL: https://github.com/apache/spark/pull/32488#discussion_r634893418 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala ## @@ -121,6 +129,24 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { if canImplicitlyCast(fromExp, toType, literalType) => simplifyNumericComparison(be, fromExp, toType, value) +case in @ In(Cast(fromExp, toType: NumericType, _), list) + if list.forall(v => +v.isInstanceOf[Literal] && canImplicitlyCast(fromExp, toType, v.dataType) + ) => + val (newValueList, exp) = +list.map(lit => unwrapCast(EqualTo(in.value, lit))) + .partition { Review comment: As @sunchao mentioned before, we `partition` results because the `unwrapCast` converts a `EqualTo` to a different expression. For instance: ``` x = 1 => x = 1 ``` ``` x = 2147483648 => if(isnull(x), null, false) ``` if `x` is of integer type. Do you mean we can simply do `x in (1, 2147483648) => x in (1)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate
cfmcgrady commented on a change in pull request #32488: URL: https://github.com/apache/spark/pull/32488#discussion_r634893337 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala ## @@ -121,6 +129,24 @@ object UnwrapCastInBinaryComparison extends Rule[LogicalPlan] { if canImplicitlyCast(fromExp, toType, literalType) => simplifyNumericComparison(be, fromExp, toType, value) +case in @ In(Cast(fromExp, toType: NumericType, _), list) + if list.forall(v => +v.isInstanceOf[Literal] && canImplicitlyCast(fromExp, toType, v.dataType) Review comment: updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
SparkQA commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843724540 **[Test build #138700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138700/testReport)** for PR 32586 at commit [`7b6b589`](https://github.com/apache/spark/commit/7b6b5894faa3220355e9ec1bf49b59081cdb7d83). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32584: Test pandas nondeterministic return values
AmplabJenkins commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843724112 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43217/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya commented on pull request #32586: URL: https://github.com/apache/spark/pull/32586#issuecomment-843723648 cc @cloud-fan @maropu @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted
viirya opened a new pull request #32586: URL: https://github.com/apache/spark/pull/32586 ### What changes were proposed in this pull request? This patch replaces `HashMap` with `LinkedHashMap` as the map of equivalent expressions in `EquivalentExpressions` used for subexpression elimination. ### Why are the changes needed? `EquivalentExpressions` maintains a map of equivalent expressions. It is `HashMap` now so the insertion order is not guaranteed to be preserved later. Subexpression elimination relies on retrieving subexpressions from the map. If there is child-parent relationships among the subexpressions, we want the child expressions come first than parent expressions, so we can replace child expressions in parent expressions with subexpression evaluation. Although we add expressions recursively into the map with depth-first approach, when we retrieve the map values, it is not guaranteed that the order is preserved. We should use `LinkedHashMap` for this usage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #32385: [WIP][SPARK-35275][CORE] Add checksum for shuffle blocks and diagnose corruption
Ngone51 commented on pull request #32385: URL: https://github.com/apache/spark/pull/32385#issuecomment-843722683 @tgravescs Thanks for the good points! I did find some perf regression by benchmarking with the change. I'll double-check it for sure and try to get rid of it if possible. > Also at the high level how does this affect other shuffle work going on - like merging and pluggable? Is it independent of that or would need to be implemented? For merging, it needs extension to send checksum values along with the block data while merging. The extension is also needed for the decommission feature. For pluggable, my current implementation is added at `LocalDiskShuffleMapOutputWriter`, which is supposed to be the default shuffle writer plugin for Spark. It means, in this way, other custom plugins needs its own implementation for checksum support. I adopted that way becase I realized it's easier and more clear to implement at that time. An alternative way to support checksum for all plugins or say to make it a built-in feature maybe is to implement it in `DiskBlockObjectWriter`/`ShufflePartitionPairsWriter`, which is the upstream to the shuffle I/O plugin. I need more investigation on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
SparkQA commented on pull request #32563: URL: https://github.com/apache/spark/pull/32563#issuecomment-843722159 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics
SparkQA commented on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-843721808 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43220/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #32389: [SPARK-35263] [TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code
mridulm commented on pull request #32389: URL: https://github.com/apache/spark/pull/32389#issuecomment-843721497 Merging to master, thanks @xkrogen. Thanks for the reviews @Ngone51 , @otterc ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] asfgit closed pull request #32389: [SPARK-35263] [TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code
asfgit closed pull request #32389: URL: https://github.com/apache/spark/pull/32389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values
SparkQA commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843709302 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43217/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843708329 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43218/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843708329 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43218/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843708314 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43218/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634879276 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -17,19 +17,130 @@ package org.apache.spark.sql.execution.datasources.v2 -import org.apache.spark.sql.catalyst.expressions.{And, Expression, NamedExpression, ProjectionOverSchema, SubqueryExpression} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression import org.apache.spark.sql.catalyst.planning.ScanOperation -import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, LogicalPlan, Project} import org.apache.spark.sql.catalyst.rules.Rule -import org.apache.spark.sql.connector.read.{Scan, V1Scan} +import org.apache.spark.sql.catalyst.util.toPrettySQL +import org.apache.spark.sql.connector.read.{Scan, SupportsPushDownAggregates, V1Scan} import org.apache.spark.sql.execution.datasources.DataSourceStrategy import org.apache.spark.sql.sources +import org.apache.spark.sql.sources.{AggregateFunc, Aggregation} import org.apache.spark.sql.types.StructType -object V2ScanRelationPushDown extends Rule[LogicalPlan] { +object V2ScanRelationPushDown extends Rule[LogicalPlan] with AliasHelper { Review comment: I will redo this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634879199 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources + +import org.apache.spark.sql.types.DataType + +// Aggregate Functions in SQL statement. +// e.g. SELECT COUNT(EmployeeID), Max(salary), deptID FROM dept GROUP BY deptID +// aggregateExpressions are (COUNT(EmployeeID), Max(salary)), groupByColumns are (deptID) +case class Aggregation(aggregateExpressions: Seq[Seq[AggregateFunc]], + groupByColumns: Seq[String]) + +abstract class AggregateFunc + +case class Min(column: String, dataType: DataType) extends AggregateFunc +case class Max(column: String, dataType: DataType) extends AggregateFunc Review comment: This is not supported by Parquet and ORC because they don't have these kind of statistics info. For JDBC, I treat this as one Column. I have something like this: ``` private def columnAsString(e: Expression): String = e match { case AttributeReference(name, _, _, _) => name case Cast(child, _, _) => columnAsString (child) case Add(left, right, _) => columnAsString(left) + " + " + columnAsString(right) case Subtract(left, right, _) => columnAsString(left) + " - " + columnAsString(right) case Multiply(left, right, _) => columnAsString(left) + " * " + columnAsString(right) case Divide(left, right, _) => columnAsString(left) + " / " + columnAsString(right) case CheckOverflow(child, _, _) => columnAsString (child) case PromotePrecision(child) => columnAsString (child) case _ => "" } ``` When we do sql("SELECT Max(a+b) FROM t").show, it has: ``` ++ |max((a + b))| ++ ++ ``` I guess it makes sense to treat this as one column? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
AmplabJenkins removed a comment on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843702426 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43216/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634878055 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -102,6 +102,7 @@ case class RowDataSourceScanExec( requiredSchema: StructType, filters: Set[Filter], handledFilters: Set[Filter], +aggregation: Aggregation, Review comment: Will do -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634877762 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources + +import org.apache.spark.sql.types.DataType + +// Aggregate Functions in SQL statement. +// e.g. SELECT COUNT(EmployeeID), Max(salary), deptID FROM dept GROUP BY deptID +// aggregateExpressions are (COUNT(EmployeeID), Max(salary)), groupByColumns are (deptID) +case class Aggregation(aggregateExpressions: Seq[Seq[AggregateFunc]], Review comment: If we want to support `Avg` later, `Avg` will be rewritten to `Sum`/`Count`. I am grouping all the aggregates in this pushed down Aggregation class. For example, `SELECT Max(c1), Sum(c1), Min(c2), Avg(c3) FROM test`, The pushed down Aggregation has `[[Max], [Sum], [Min], [Sum, Count]]`, so it's a Seq of Seq -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634877439 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.read; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.sources.Aggregation; +import org.apache.spark.sql.types.StructType; + +/** + * A mix-in interface for {@link ScanBuilder}. Data source can implement this interface to + * push down aggregates to the data source. + * + * @since 3.2.0 + */ +@Evolving +public interface SupportsPushDownAggregates extends ScanBuilder { + + /** + * Pushes down Aggregation to datasource. + * The Aggregation can be pushed down only if all the Aggregate Functions can + * be pushed down. + */ + void pushAggregation(Aggregation aggregation); + + /** + * Returns the aggregation that are pushed to the data source via + * {@link #pushAggregation(Aggregation aggregation)}. + */ + Aggregation pushedAggregation(); + + /** + * Returns the schema of the pushed down aggregates + */ + StructType getPushDownAggSchema(); + + /** + * Indicate if the data source only supports global aggregated push down + */ + boolean supportsGlobalAggregatePushDownOnly(); + + /** + * Indicate if the data source supports push down aggregates along with filters Review comment: I mean if we can push down aggregate and filter together. For example, `SELECT Max(c1) FROM t WHERE c2 >1`, we can push down both aggregate and filter for JDBC. But I am not sure about Parquet and ORC. If the filter doesn't affect the footer's Max/Min/Count value, we can push down both aggregate and filter, otherwise, we can't. I am not sure how to check if the filter affects the footer's Max/Main/Count value or not. Currently I only push down Max/Min/Count if no filter. If filter is present, I only push down filter, not Max/Min/Count. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources Review comment: Yes. Will change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31876: [SPARK-34942][API][CORE] Abstract Location in MapStatus to enable support for custom storage
Ngone51 commented on pull request #31876: URL: https://github.com/apache/spark/pull/31876#issuecomment-843704131 @mridulm Sorry, missed your last comment... I think we can go ahead to update to the latest as long as we can put SPARK-35188 aside for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet
huaxingao commented on a change in pull request #32049: URL: https://github.com/apache/spark/pull/32049#discussion_r634877198 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.read; + +import org.apache.spark.annotation.Evolving; +import org.apache.spark.sql.sources.Aggregation; +import org.apache.spark.sql.types.StructType; + +/** + * A mix-in interface for {@link ScanBuilder}. Data source can implement this interface to + * push down aggregates to the data source. + * + * @since 3.2.0 + */ +@Evolving +public interface SupportsPushDownAggregates extends ScanBuilder { + + /** + * Pushes down Aggregation to datasource. + * The Aggregation can be pushed down only if all the Aggregate Functions can + * be pushed down. + */ + void pushAggregation(Aggregation aggregation); + + /** + * Returns the aggregation that are pushed to the data source via + * {@link #pushAggregation(Aggregation aggregation)}. + */ + Aggregation pushedAggregation(); Review comment: This is either the input of `pushAggregation` or empty (we don't push down if not all the aggregate functions in the input of `pushAggregation` can be pushed down). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command
SparkQA commented on pull request #32563: URL: https://github.com/apache/spark/pull/32563#issuecomment-843703128 **[Test build #138698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138698/testReport)** for PR 32563 at commit [`8c1e5ce`](https://github.com/apache/spark/commit/8c1e5cef9da041835eaa84f1bd38fde9abfa363b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics
SparkQA commented on pull request #32552: URL: https://github.com/apache/spark/pull/32552#issuecomment-843703118 **[Test build #138699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138699/testReport)** for PR 32552 at commit [`877af88`](https://github.com/apache/spark/commit/877af88115058542c5be17c525403aa6b4368513). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform
AngersZh commented on pull request #32365: URL: https://github.com/apache/spark/pull/32365#issuecomment-843702945 ping @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
imback82 commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-843702839 > 1. Have a single `AlterTable` logical plan, which has 2 members: `table: LogicalPlan` and `changes: Seq[TableChange]`. We also have a single `AlterTableExec` physical plan. Do you mean generating `AlterTable` in `AstBuilder`? One issue with this approach is for v1 commands (`ResolveSessionCatalog`) because there could be an ambiguity b/w `ALTER TABLE ADD COLUMN` and `ALTER TABLE REPLACE COLUMN`. That's why this PR created different `AlterTableXXX` with a base `AlterTable`. > 2. Have various `AlterTableXXX` logical plans, which are similar to the existing `AlterTableXXXStatement`. Update the code that resolve/check `Seq[TableChange]` to deal with `AlterTableXXX` logical plans instead. We still have a single `AlterTableExec` physical plan, the planner will convert all `AlterTableXXX` logical plans to it. I can try this approach and create a separate PR for comparison. Let me know if I misunderstood 1) option above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu removed a comment on pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform
AngersZh removed a comment on pull request #32365: URL: https://github.com/apache/spark/pull/32365#issuecomment-842043224 gentle ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator
AmplabJenkins commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843702426 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43216/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843697380 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43218/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator
SparkQA commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843691109 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43216/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-843687426 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138693/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-843687426 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138693/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-843586584 **[Test build #138693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138693/testReport)** for PR 32498 at commit [`bb69f84`](https://github.com/apache/spark/commit/bb69f84a3d05acec239db571afbfe2f41007d9ce). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-843686748 **[Test build #138693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138693/testReport)** for PR 32498 at commit [`bb69f84`](https://github.com/apache/spark/commit/bb69f84a3d05acec239db571afbfe2f41007d9ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843682411 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138691/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32584: Test pandas nondeterministic return values
AmplabJenkins removed a comment on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843682409 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138696/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins removed a comment on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843682410 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138694/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhouyejoe commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data
zhouyejoe commented on a change in pull request #32007: URL: https://github.com/apache/spark/pull/32007#discussion_r634860230 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FinalizeShuffleMerge.java ## @@ -30,12 +30,15 @@ */ public class FinalizeShuffleMerge extends BlockTransferMessage { public final String appId; + public final int attemptId; Review comment: In rare cases, the driver container from the last attempt may still be running in the cluster whereas the driver from the second attempt has been launched. It is better that we can also have the attemptId in the FinalizeShuffleMessage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843683480 **[Test build #138697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138697/testReport)** for PR 32494 at commit [`3afaf32`](https://github.com/apache/spark/commit/3afaf329a16727a3929331b65cb06543d11e44c5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843682411 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138691/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins commented on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843682410 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138694/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32584: Test pandas nondeterministic return values
AmplabJenkins commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843682409 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138696/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values
SparkQA commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843679971 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43217/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator
SparkQA commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843679640 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43216/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] YuzhouSun commented on a change in pull request #32530: [SPARK-35106][Core][SQL] Avoid failing rename caused by destination directory not exist
YuzhouSun commented on a change in pull request #32530: URL: https://github.com/apache/spark/pull/32530#discussion_r634855854 ## File path: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala ## @@ -188,13 +188,18 @@ class HadoopMapReduceCommitProtocol( val filesToMove = allAbsPathFiles.foldLeft(Map[String, String]())(_ ++ _) logDebug(s"Committing files staged for absolute locations $filesToMove") + val absParentPaths = filesToMove.values.map(new Path(_).getParent).toSet if (dynamicPartitionOverwrite) { -val absPartitionPaths = filesToMove.values.map(new Path(_).getParent).toSet -logDebug(s"Clean up absolute partition directories for overwriting: $absPartitionPaths") -absPartitionPaths.foreach(fs.delete(_, true)) +logDebug(s"Clean up absolute partition directories for overwriting: $absParentPaths") +absParentPaths.foreach(fs.delete(_, true)) } + logDebug(s"Create absolute parent directories: $absParentPaths") + absParentPaths.foreach(fs.mkdirs) Review comment: It's in case that `absParentPaths` has never been created before the job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32584: Test pandas nondeterministic return values
SparkQA removed a comment on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843663335 **[Test build #138696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)** for PR 32584 at commit [`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values
SparkQA commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843673362 **[Test build #138696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)** for PR 32584 at commit [`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA removed a comment on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843618436 **[Test build #138694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138694/testReport)** for PR 32469 at commit [`9cfb617`](https://github.com/apache/spark/commit/9cfb617919318b646c071cb19157ffabab1fe14b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA commented on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843672761 **[Test build #138694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138694/testReport)** for PR 32469 at commit [`9cfb617`](https://github.com/apache/spark/commit/9cfb617919318b646c071cb19157ffabab1fe14b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843551331 **[Test build #138691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138691/testReport)** for PR 32494 at commit [`3d38208`](https://github.com/apache/spark/commit/3d38208bb00ae9ffaef80d244983fa63cba14303). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-843668909 **[Test build #138691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138691/testReport)** for PR 32494 at commit [`3d38208`](https://github.com/apache/spark/commit/3d38208bb00ae9ffaef80d244983fa63cba14303). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins removed a comment on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843662034 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43215/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Kimahriman commented on pull request #32559: [SPARK-35410][SQL] SubExpr elimination should not include redundant children exprs in conditional expression
Kimahriman commented on pull request #32559: URL: https://github.com/apache/spark/pull/32559#issuecomment-843663478 > > I did actually hit a bug today where the when value was being evaluated even though the condition was false. I wasn't able to find the exact root cause yet but turning off subexpression elimination fixed the issue. It was basically `when(col.rlike(...), udf(col))`, but more complex on both sides so somehow the UDF was getting subexpression eval'd early and failed because it didn't match the regular expression > > I see. Normally, for catalyst expressions, it is performance issue only. But for UDF, we cannot expect the logic put in UDF by users. It is possibly the UDF fails unexpectedly in such cases. So looks like it is better to backport it as a bug fix. I'll try to see if I can get a real breaking example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values
SparkQA commented on pull request #32584: URL: https://github.com/apache/spark/pull/32584#issuecomment-843663335 **[Test build #138696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)** for PR 32584 at commit [`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator
SparkQA commented on pull request #32585: URL: https://github.com/apache/spark/pull/32585#issuecomment-843663224 **[Test build #138695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)** for PR 32585 at commit [`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
AmplabJenkins commented on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843662034 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43215/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator
c21 opened a new pull request #32585: URL: https://github.com/apache/spark/pull/32585 ### What changes were proposed in this pull request? As title. Fixed two places where the documentation for window operator has some error. ### Why are the changes needed? Help people read code for window operator more easily in the future. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures
SparkQA commented on pull request #32469: URL: https://github.com/apache/spark/pull/32469#issuecomment-843650789 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43215/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] PerilousApricot commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
PerilousApricot commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-843650178 Hello, this looks like very good work. I'm having some trouble reading the code -- is there a possibility that these UDTs could leverage https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extension-types when they're in Pandas/Python to skip a costly conversion to Object that currently happens? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org