date:20210518

[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-18 Thread GitBox



viirya edited a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843769636


   Please take another look. I found corner case and added a test case. Thanks. 
cc @cloud-fan @maropu @dongjoon-hyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843769636


   Please take another look. I found corner case and added a test case. cc 
@cloud-fan @maropu @dongjoon-hyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32570:
URL: https://github.com/apache/spark/pull/32570#issuecomment-843765326


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843765325


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Children subexpr should come first than parent subexpr

2021-05-18 Thread GitBox



SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843766768


   **[Test build #138703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138703/testReport)**
 for PR 32586 at commit 
[`f777855`](https://github.com/apache/spark/commit/f7778555f6b12264108d9f25a860bc8ff467ba59).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843765325


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32570:
URL: https://github.com/apache/spark/pull/32570#issuecomment-843765326


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843764793


   Hmm, I found corner case that LinkedHashMap doesn't work here. Going to 
update and adding test case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF

2021-05-18 Thread GitBox



SparkQA commented on pull request #32587:
URL: https://github.com/apache/spark/pull/32587#issuecomment-843764594


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43223/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



wangyum commented on a change in pull request #32563:
URL: https://github.com/apache/spark/pull/32563#discussion_r634924338



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
##
@@ -580,41 +587,52 @@ case class RenameTable(
  */
 case class ShowTables(
 namespace: LogicalPlan,
-pattern: Option[String],
-override val output: Seq[Attribute] = ShowTables.getOutputAttrs) extends 
UnaryCommand {
+pattern: Option[String]) extends UnaryCommand {
+  override val output: Seq[Attribute] = {
+val tableName = AttributeReference("tableName", StringType, nullable = 
false)()
+val isTemporary = AttributeReference("isTemporary", BooleanType, nullable 
= false)()
+if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
+  Seq(AttributeReference("database", StringType, nullable = false)(), 
tableName, isTemporary)
+} else {
+  Seq(AttributeReference("namespace", StringType, nullable = false)(), 
tableName, isTemporary)
+}
+  }
+
   override def child: LogicalPlan = namespace
   override protected def withNewChildInternal(newChild: LogicalPlan): 
ShowTables =
 copy(namespace = newChild)
 }
 
-object ShowTables {
-  def getOutputAttrs: Seq[Attribute] = Seq(
-AttributeReference("namespace", StringType, nullable = false)(),
-AttributeReference("tableName", StringType, nullable = false)(),
-AttributeReference("isTemporary", BooleanType, nullable = false)())
-}
-
 /**
  * The logical plan of the SHOW TABLE EXTENDED command.
  */
 case class ShowTableExtended(
 namespace: LogicalPlan,
 pattern: String,
-partitionSpec: Option[PartitionSpec],
-override val output: Seq[Attribute] = ShowTableExtended.getOutputAttrs) 
extends UnaryCommand {
+partitionSpec: Option[PartitionSpec]) extends UnaryCommand {
+  override val output: Seq[Attribute] = {
+val tableName = AttributeReference("tableName", StringType, nullable = 
false)()
+val isTemporary = AttributeReference("isTemporary", BooleanType, nullable 
= false)()
+if (conf.getConf(SQLConf.LEGACY_KEEP_COMMAND_OUTPUT_SCHEMA)) {
+  Seq(
+AttributeReference("database", StringType, nullable = false)(),
+tableName,
+isTemporary,
+AttributeReference("information", StringType, nullable = false)())
+} else {
+  Seq(
+AttributeReference("namespace", StringType, nullable = false)(),
+tableName,
+isTemporary,
+AttributeReference("information", MapType(StringType, StringType), 
nullable = false)())
+}

Review comment:
   Move the `spark.sql.legacy.keepCommandOutputSchema` logic from 
`ResolveSessionCatalog`  to `v2Commands`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-18 Thread GitBox



LuciferYang commented on pull request #32536:
URL: https://github.com/apache/spark/pull/32536#issuecomment-843759422


   thx all ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-18 Thread GitBox



cloud-fan commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-843759217


   I see, then option 1 is not valid. Can you open a new PR to try option 2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32570:
URL: https://github.com/apache/spark/pull/32570#issuecomment-843753601


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843751331


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF

2021-05-18 Thread GitBox



SparkQA commented on pull request #32587:
URL: https://github.com/apache/spark/pull/32587#issuecomment-843746961


   **[Test build #138702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138702/testReport)**
 for PR 32587 at commit 
[`3aa2124`](https://github.com/apache/spark/commit/3aa21241102526d9a8245047158f5eeb632f6faf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] linhongliu-db opened a new pull request #32587: [SPARK-35440][SQL] Add language type to `ExpressionInfo` for UDF

2021-05-18 Thread GitBox



linhongliu-db opened a new pull request #32587:
URL: https://github.com/apache/spark/pull/32587


   ### What changes were proposed in this pull request?
   Add the language, such as "scala", "python", "java", "hive", "built-in" to 
the `ExpressionInfo` for UDF.
   
   ### Why are the changes needed?
   Make the `ExpressionInfo` of UDF more meaningful
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   existing and newly added UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843745981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138695/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843745981


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138695/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32552:
URL: https://github.com/apache/spark/pull/32552#issuecomment-843742972


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-843742973


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



SparkQA removed a comment on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843663224


   **[Test build #138695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)**
 for PR 32585 at commit 
[`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



SparkQA commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843745233


   **[Test build #138695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)**
 for PR 32585 at commit 
[`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya removed a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya removed a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300


   Sorry I misread the code, looks like we add parent expression first 
into the map and traverse to its children expressions. Let me put it in draft 
first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843743329


   > The fix looks fine. Is it difficult to add some tests for that case?
   
   I don't come out a test that fails before but succeeds after this. I think 
the retrieving order is okay during my test. But it is not guaranteed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32552:
URL: https://github.com/apache/spark/pull/32552#issuecomment-843742972


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-843742973


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32570:
URL: https://github.com/apache/spark/pull/32570#issuecomment-843742102


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43222/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843741600


   I figured out this change makes sense. But the description is not correct. I 
will update it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya edited a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300


   Sorry I misread the code, looks like we add parent expression first 
into the map and traverse to its children expressions. Let me put it in draft 
first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843740202


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43221/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya edited a comment on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300


   Sorry I misread the code, looks like we add parent expression first into the 
map and traverse to its children expressions. Let me put it in draft first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843733300


   Let me put it in draft first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-05-18 Thread GitBox



SparkQA commented on pull request #32552:
URL: https://github.com/apache/spark/pull/32552#issuecomment-843732045


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



SparkQA commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-843731519


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



maropu commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843730003


   The fix looks fine. Is it difficult to add some tests for that case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-18 Thread GitBox



maropu commented on pull request #32536:
URL: https://github.com/apache/spark/pull/32536#issuecomment-843728916


   Thank you, @LuciferYang . Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu closed pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` method

2021-05-18 Thread GitBox



maropu closed pull request #32536:
URL: https://github.com/apache/spark/pull/32536


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843724112


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43217/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32570:
URL: https://github.com/apache/spark/pull/32570#issuecomment-843725725


   **[Test build #138701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138701/testReport)**
 for PR 32570 at commit 
[`4574440`](https://github.com/apache/spark/commit/4574440d134cf3ce0595b2afd12b47729749be97).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on a change in pull request #32570: [SPARK-35421][SS] Remove redundant ProjectExec from streaming queries with V2Relation

2021-05-18 Thread GitBox



sarutak commented on a change in pull request #32570:
URL: https://github.com/apache/spark/pull/32570#discussion_r634894308



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
##
@@ -120,20 +122,22 @@ class DataSourceV2Strategy(session: SparkSession) extends 
Strategy with Predicat
 scanExec

Review comment:
   Ah, I see, thanks. I've updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cfmcgrady commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate

2021-05-18 Thread GitBox



cfmcgrady commented on a change in pull request #32488:
URL: https://github.com/apache/spark/pull/32488#discussion_r634893418



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##
@@ -121,6 +129,24 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 if canImplicitlyCast(fromExp, toType, literalType) =>
   simplifyNumericComparison(be, fromExp, toType, value)
 
+case in @ In(Cast(fromExp, toType: NumericType, _), list)
+  if list.forall(v =>
+v.isInstanceOf[Literal] && canImplicitlyCast(fromExp, toType, 
v.dataType)
+  ) =>
+  val (newValueList, exp) =
+list.map(lit => unwrapCast(EqualTo(in.value, lit)))
+  .partition {

Review comment:
   As @sunchao  mentioned before, we `partition` results because the 
`unwrapCast` converts a `EqualTo` to a different expression. 
   
   For instance:
   
   ```
   x = 1 => x = 1
   ```
   ```
   x = 2147483648 => if(isnull(x), null, false)
   ```
   
   if `x` is of integer type.
   
   Do you mean we can simply do `x in (1, 2147483648) => x in (1)`?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cfmcgrady commented on a change in pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In predicate

2021-05-18 Thread GitBox



cfmcgrady commented on a change in pull request #32488:
URL: https://github.com/apache/spark/pull/32488#discussion_r634893337



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala
##
@@ -121,6 +129,24 @@ object UnwrapCastInBinaryComparison extends 
Rule[LogicalPlan] {
 if canImplicitlyCast(fromExp, toType, literalType) =>
   simplifyNumericComparison(be, fromExp, toType, value)
 
+case in @ In(Cast(fromExp, toType: NumericType, _), list)
+  if list.forall(v =>
+v.isInstanceOf[Literal] && canImplicitlyCast(fromExp, toType, 
v.dataType)

Review comment:
   updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



SparkQA commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843724540


   **[Test build #138700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138700/testReport)**
 for PR 32586 at commit 
[`7b6b589`](https://github.com/apache/spark/commit/7b6b5894faa3220355e9ec1bf49b59081cdb7d83).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843724112


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43217/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya commented on pull request #32586:
URL: https://github.com/apache/spark/pull/32586#issuecomment-843723648


   cc @cloud-fan @maropu @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya opened a new pull request #32586: [SPARK-35439][SQL] Use LinkedHashMap to guarantee traversing with the order they were inserted

2021-05-18 Thread GitBox



viirya opened a new pull request #32586:
URL: https://github.com/apache/spark/pull/32586


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch replaces `HashMap` with `LinkedHashMap` as the map of equivalent 
expressions in `EquivalentExpressions` used for subexpression elimination.
   
   ### Why are the changes needed?
   
   
   `EquivalentExpressions` maintains a map of equivalent expressions. It is 
`HashMap` now so the insertion order is not guaranteed to be preserved later. 
Subexpression elimination relies on retrieving subexpressions from the map. If 
there is child-parent relationships among the subexpressions, we want the child 
expressions come first than parent expressions, so we can replace child 
expressions in parent expressions with subexpression evaluation.
   
   Although we add expressions recursively into the map with depth-first 
approach, when we retrieve the map values, it is not guaranteed that the order 
is preserved. We should use `LinkedHashMap` for this usage.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #32385: [WIP][SPARK-35275][CORE] Add checksum for shuffle blocks and diagnose corruption

2021-05-18 Thread GitBox



Ngone51 commented on pull request #32385:
URL: https://github.com/apache/spark/pull/32385#issuecomment-843722683


   @tgravescs Thanks for the good points!
   
   I did find some perf regression by benchmarking with the change. I'll 
double-check it for sure and try to get rid of it if possible.
   
   > Also at the high level how does this affect other shuffle work going on - 
like merging and pluggable? Is it independent of that or would need to be 
implemented?
   
   For merging, it needs extension to send checksum values along with the block 
data while merging. The extension is also needed for the decommission feature. 
   
   For pluggable, my current implementation is added at 
`LocalDiskShuffleMapOutputWriter`, which is supposed to be the default shuffle 
writer plugin for Spark. It means, in this way, other custom plugins needs its 
own implementation for checksum support. I adopted that way becase I realized 
it's easier  and more clear to implement at that time.
   
   An alternative way to support checksum for all plugins or say to make it a 
built-in feature maybe is to implement it in 
`DiskBlockObjectWriter`/`ShufflePartitionPairsWriter`, which is the upstream to 
the shuffle I/O plugin. I need more investigation on this.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



SparkQA commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-843722159


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43219/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-05-18 Thread GitBox



SparkQA commented on pull request #32552:
URL: https://github.com/apache/spark/pull/32552#issuecomment-843721808


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43220/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on pull request #32389: [SPARK-35263] [TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code

2021-05-18 Thread GitBox



mridulm commented on pull request #32389:
URL: https://github.com/apache/spark/pull/32389#issuecomment-843721497


   Merging to master, thanks @xkrogen.
   Thanks for the reviews @Ngone51 , @otterc !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] asfgit closed pull request #32389: [SPARK-35263] [TEST] Refactor ShuffleBlockFetcherIteratorSuite to reduce duplicated code

2021-05-18 Thread GitBox



asfgit closed pull request #32389:
URL: https://github.com/apache/spark/pull/32389


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



SparkQA commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843709302


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43217/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843708329


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843708329


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843708314


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634879276



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala
##
@@ -17,19 +17,130 @@
 
 package org.apache.spark.sql.execution.datasources.v2
 
-import org.apache.spark.sql.catalyst.expressions.{And, Expression, 
NamedExpression, ProjectionOverSchema, SubqueryExpression}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression
 import org.apache.spark.sql.catalyst.planning.ScanOperation
-import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Filter, 
LogicalPlan, Project}
 import org.apache.spark.sql.catalyst.rules.Rule
-import org.apache.spark.sql.connector.read.{Scan, V1Scan}
+import org.apache.spark.sql.catalyst.util.toPrettySQL
+import org.apache.spark.sql.connector.read.{Scan, SupportsPushDownAggregates, 
V1Scan}
 import org.apache.spark.sql.execution.datasources.DataSourceStrategy
 import org.apache.spark.sql.sources
+import org.apache.spark.sql.sources.{AggregateFunc, Aggregation}
 import org.apache.spark.sql.types.StructType
 
-object V2ScanRelationPushDown extends Rule[LogicalPlan] {
+object V2ScanRelationPushDown extends Rule[LogicalPlan] with AliasHelper {

Review comment:
   I will redo this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634879199



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources
+
+import org.apache.spark.sql.types.DataType
+
+// Aggregate Functions in SQL statement.
+// e.g. SELECT COUNT(EmployeeID), Max(salary), deptID FROM dept GROUP BY deptID
+// aggregateExpressions are (COUNT(EmployeeID), Max(salary)), groupByColumns 
are (deptID)
+case class Aggregation(aggregateExpressions: Seq[Seq[AggregateFunc]],
+   groupByColumns: Seq[String])
+
+abstract class AggregateFunc
+
+case class Min(column: String, dataType: DataType) extends AggregateFunc
+case class Max(column: String, dataType: DataType) extends AggregateFunc

Review comment:
   This is not supported by Parquet and ORC because they don't have these 
kind of statistics info. For JDBC,  I treat this as one Column. I have 
something like this: 
   ```
 private def columnAsString(e: Expression): String = e match {
   case AttributeReference(name, _, _, _) => name
   case Cast(child, _, _) => columnAsString (child)
   case Add(left, right, _) =>
 columnAsString(left) + " + " + columnAsString(right)
   case Subtract(left, right, _) =>
 columnAsString(left) + " - " + columnAsString(right)
   case Multiply(left, right, _) =>
 columnAsString(left) + " * " + columnAsString(right)
   case Divide(left, right, _) =>
 columnAsString(left) + " / " + columnAsString(right)
   case CheckOverflow(child, _, _) => columnAsString (child)
   case PromotePrecision(child) => columnAsString (child)
   case _ => ""
 }
   ```
   When we do sql("SELECT Max(a+b) FROM t").show, it has:
   ```
   ++
   |max((a + b))|
   ++
   ++
   ```
   I guess it makes sense to treat this as one column?
   
   
   
   
   
   
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843702426


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43216/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634878055



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
##
@@ -102,6 +102,7 @@ case class RowDataSourceScanExec(
 requiredSchema: StructType,
 filters: Set[Filter],
 handledFilters: Set[Filter],
+aggregation: Aggregation,

Review comment:
   Will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634877762



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources
+
+import org.apache.spark.sql.types.DataType
+
+// Aggregate Functions in SQL statement.
+// e.g. SELECT COUNT(EmployeeID), Max(salary), deptID FROM dept GROUP BY deptID
+// aggregateExpressions are (COUNT(EmployeeID), Max(salary)), groupByColumns 
are (deptID)
+case class Aggregation(aggregateExpressions: Seq[Seq[AggregateFunc]],

Review comment:
   If we want to support `Avg` later, `Avg` will be rewritten to 
`Sum`/`Count`. I am grouping all the aggregates  in this pushed down 
Aggregation class. For example,  `SELECT Max(c1), Sum(c1), Min(c2), Avg(c3) 
FROM test`, The pushed down Aggregation has `[[Max], [Sum], [Min], [Sum, 
Count]]`, so it's a Seq of Seq




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634877439



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.sources.Aggregation;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * A mix-in interface for {@link ScanBuilder}. Data source can implement this 
interface to
+ * push down aggregates to the data source.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsPushDownAggregates extends ScanBuilder {
+
+  /**
+   * Pushes down Aggregation to datasource.
+   * The Aggregation can be pushed down only if all the Aggregate Functions can
+   * be pushed down.
+   */
+  void pushAggregation(Aggregation aggregation);
+
+  /**
+   * Returns the aggregation that are pushed to the data source via
+   * {@link #pushAggregation(Aggregation aggregation)}.
+   */
+  Aggregation pushedAggregation();
+
+  /**
+   * Returns the schema of the pushed down aggregates
+   */
+  StructType getPushDownAggSchema();
+
+  /**
+   * Indicate if the data source only supports global aggregated push down
+   */
+  boolean supportsGlobalAggregatePushDownOnly();
+
+  /**
+   * Indicate if the data source supports push down aggregates along with 
filters

Review comment:
   I mean if we can push down aggregate and filter together. For example, 
`SELECT Max(c1) FROM t WHERE c2 >1`, we can push down both aggregate and filter 
for JDBC. But I am not sure about Parquet and ORC. If the filter doesn't affect 
the footer's Max/Min/Count value, we can push down both aggregate and filter, 
otherwise, we can't. I am not sure how to check if the filter affects the 
footer's Max/Main/Count value or not. Currently I only push down Max/Min/Count 
if no filter. If filter is present, I only push down filter, not Max/Min/Count.

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/sources/aggregates.scala
##
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources

Review comment:
   Yes. Will change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #31876: [SPARK-34942][API][CORE] Abstract Location in MapStatus to enable support for custom storage

2021-05-18 Thread GitBox



Ngone51 commented on pull request #31876:
URL: https://github.com/apache/spark/pull/31876#issuecomment-843704131


   @mridulm Sorry, missed your last comment... I think we can go ahead to 
update to the latest as long as we can put SPARK-35188 aside for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #32049: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

2021-05-18 Thread GitBox



huaxingao commented on a change in pull request #32049:
URL: https://github.com/apache/spark/pull/32049#discussion_r634877198



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java
##
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.read;
+
+import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.sources.Aggregation;
+import org.apache.spark.sql.types.StructType;
+
+/**
+ * A mix-in interface for {@link ScanBuilder}. Data source can implement this 
interface to
+ * push down aggregates to the data source.
+ *
+ * @since 3.2.0
+ */
+@Evolving
+public interface SupportsPushDownAggregates extends ScanBuilder {
+
+  /**
+   * Pushes down Aggregation to datasource.
+   * The Aggregation can be pushed down only if all the Aggregate Functions can
+   * be pushed down.
+   */
+  void pushAggregation(Aggregation aggregation);
+
+  /**
+   * Returns the aggregation that are pushed to the data source via
+   * {@link #pushAggregation(Aggregation aggregation)}.
+   */
+  Aggregation pushedAggregation();

Review comment:
   This is either the input of `pushAggregation` or empty (we don't push 
down if not all the aggregate functions in the input of `pushAggregation` can 
be pushed down). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32563: [SPARK-35415][SQL] Change `information` to map type for SHOW TABLE EXTENDED command

2021-05-18 Thread GitBox



SparkQA commented on pull request #32563:
URL: https://github.com/apache/spark/pull/32563#issuecomment-843703128


   **[Test build #138698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138698/testReport)**
 for PR 32563 at commit 
[`8c1e5ce`](https://github.com/apache/spark/commit/8c1e5cef9da041835eaa84f1bd38fde9abfa363b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32552: [SPARK-34819][SQL] MapType supports comparable semantics

2021-05-18 Thread GitBox



SparkQA commented on pull request #32552:
URL: https://github.com/apache/spark/pull/32552#issuecomment-843703118


   **[Test build #138699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138699/testReport)**
 for PR 32552 at commit 
[`877af88`](https://github.com/apache/spark/commit/877af88115058542c5be17c525403aa6b4368513).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform

2021-05-18 Thread GitBox



AngersZh commented on pull request #32365:
URL: https://github.com/apache/spark/pull/32365#issuecomment-843702945


   ping @MaxGekk @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-18 Thread GitBox



imback82 commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-843702839


   > 1. Have a single `AlterTable` logical plan, which has 2 members: `table: 
LogicalPlan` and `changes: Seq[TableChange]`. We also have a single 
`AlterTableExec` physical plan.
   
   Do you mean generating `AlterTable` in `AstBuilder`? One issue with this 
approach is for v1 commands (`ResolveSessionCatalog`) because there could be an 
ambiguity b/w `ALTER TABLE ADD COLUMN` and `ALTER TABLE REPLACE COLUMN`. That's 
why this PR created different `AlterTableXXX` with a base `AlterTable`.
   
   > 2. Have various `AlterTableXXX` logical plans, which are similar to the 
existing `AlterTableXXXStatement`. Update the code that resolve/check 
`Seq[TableChange]` to deal with `AlterTableXXX` logical plans instead. We still 
have a single `AlterTableExec` physical plan, the planner will convert all 
`AlterTableXXX` logical plans to it.
   
   I can try this approach and create a separate PR for comparison. Let me know 
if I misunderstood 1) option above.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu removed a comment on pull request #32365: [SPARK-35228][SQL] Add expression ToPrettyString for keep consistent between hive/spark format in df.show and transform

2021-05-18 Thread GitBox



AngersZh removed a comment on pull request #32365:
URL: https://github.com/apache/spark/pull/32365#issuecomment-842043224


   gentle ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32585: [SPARK-35438][SQL][DOCS] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843702426


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43216/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843697380


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43218/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



SparkQA commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843691109


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43216/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-843687426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138693/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-843687426


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138693/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-18 Thread GitBox



SparkQA removed a comment on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-843586584


   **[Test build #138693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138693/testReport)**
 for PR 32498 at commit 
[`bb69f84`](https://github.com/apache/spark/commit/bb69f84a3d05acec239db571afbfe2f41007d9ce).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32498:
URL: https://github.com/apache/spark/pull/32498#issuecomment-843686748


   **[Test build #138693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138693/testReport)**
 for PR 32498 at commit 
[`bb69f84`](https://github.com/apache/spark/commit/bb69f84a3d05acec239db571afbfe2f41007d9ce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843682411


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138691/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843682409


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843682410


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhouyejoe commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-18 Thread GitBox



zhouyejoe commented on a change in pull request #32007:
URL: https://github.com/apache/spark/pull/32007#discussion_r634860230



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FinalizeShuffleMerge.java
##
@@ -30,12 +30,15 @@
  */
 public class FinalizeShuffleMerge extends BlockTransferMessage {
   public final String appId;
+  public final int attemptId;

Review comment:
   In rare cases, the driver container from the last attempt may still be 
running in the cluster whereas the driver from the second attempt has been 
launched. It is better that we can also have the attemptId in the 
FinalizeShuffleMessage.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843683480


   **[Test build #138697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138697/testReport)**
 for PR 32494 at commit 
[`3afaf32`](https://github.com/apache/spark/commit/3afaf329a16727a3929331b65cb06543d11e44c5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843682411


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138691/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843682410


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138694/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843682409


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138696/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



SparkQA commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843679971


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43217/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



SparkQA commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843679640


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43216/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] YuzhouSun commented on a change in pull request #32530: [SPARK-35106][Core][SQL] Avoid failing rename caused by destination directory not exist

2021-05-18 Thread GitBox



YuzhouSun commented on a change in pull request #32530:
URL: https://github.com/apache/spark/pull/32530#discussion_r634855854



##
File path: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
##
@@ -188,13 +188,18 @@ class HadoopMapReduceCommitProtocol(
 
   val filesToMove = allAbsPathFiles.foldLeft(Map[String, String]())(_ ++ _)
   logDebug(s"Committing files staged for absolute locations $filesToMove")
+  val absParentPaths = filesToMove.values.map(new Path(_).getParent).toSet
   if (dynamicPartitionOverwrite) {
-val absPartitionPaths = filesToMove.values.map(new 
Path(_).getParent).toSet
-logDebug(s"Clean up absolute partition directories for overwriting: 
$absPartitionPaths")
-absPartitionPaths.foreach(fs.delete(_, true))
+logDebug(s"Clean up absolute partition directories for overwriting: 
$absParentPaths")
+absParentPaths.foreach(fs.delete(_, true))
   }
+  logDebug(s"Create absolute parent directories: $absParentPaths")
+  absParentPaths.foreach(fs.mkdirs)

Review comment:
   It's in case that `absParentPaths` has never been created before the job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



SparkQA removed a comment on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843663335


   **[Test build #138696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)**
 for PR 32584 at commit 
[`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



SparkQA commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843673362


   **[Test build #138696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)**
 for PR 32584 at commit 
[`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



SparkQA removed a comment on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843618436


   **[Test build #138694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138694/testReport)**
 for PR 32469 at commit 
[`9cfb617`](https://github.com/apache/spark/commit/9cfb617919318b646c071cb19157ffabab1fe14b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



SparkQA commented on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843672761


   **[Test build #138694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138694/testReport)**
 for PR 32469 at commit 
[`9cfb617`](https://github.com/apache/spark/commit/9cfb617919318b646c071cb19157ffabab1fe14b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



SparkQA removed a comment on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843551331


   **[Test build #138691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138691/testReport)**
 for PR 32494 at commit 
[`3d38208`](https://github.com/apache/spark/commit/3d38208bb00ae9ffaef80d244983fa63cba14303).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation

2021-05-18 Thread GitBox



SparkQA commented on pull request #32494:
URL: https://github.com/apache/spark/pull/32494#issuecomment-843668909


   **[Test build #138691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138691/testReport)**
 for PR 32494 at commit 
[`3d38208`](https://github.com/apache/spark/commit/3d38208bb00ae9ffaef80d244983fa63cba14303).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



AmplabJenkins removed a comment on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843662034


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43215/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Kimahriman commented on pull request #32559: [SPARK-35410][SQL] SubExpr elimination should not include redundant children exprs in conditional expression

2021-05-18 Thread GitBox



Kimahriman commented on pull request #32559:
URL: https://github.com/apache/spark/pull/32559#issuecomment-843663478


   > > I did actually hit a bug today where the when value was being evaluated 
even though the condition was false. I wasn't able to find the exact root cause 
yet but turning off subexpression elimination fixed the issue. It was basically 
`when(col.rlike(...), udf(col))`, but more complex on both sides so somehow the 
UDF was getting subexpression eval'd early and failed because it didn't match 
the regular expression
   > 
   > I see. Normally, for catalyst expressions, it is performance issue only. 
But for UDF, we cannot expect the logic put in UDF by users. It is possibly the 
UDF fails unexpectedly in such cases. So looks like it is better to backport it 
as a bug fix.
   
   I'll try to see if I can get a real breaking example


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32584: Test pandas nondeterministic return values

2021-05-18 Thread GitBox



SparkQA commented on pull request #32584:
URL: https://github.com/apache/spark/pull/32584#issuecomment-843663335


   **[Test build #138696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138696/testReport)**
 for PR 32584 at commit 
[`14e3e69`](https://github.com/apache/spark/commit/14e3e69a799d78ac844e7d3b0fdae19eded25c69).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



SparkQA commented on pull request #32585:
URL: https://github.com/apache/spark/pull/32585#issuecomment-843663224


   **[Test build #138695 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138695/testReport)**
 for PR 32585 at commit 
[`9b07938`](https://github.com/apache/spark/commit/9b079384f8474921f7932450c2e0305e64644c19).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



AmplabJenkins commented on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843662034


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43215/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] c21 opened a new pull request #32585: [SPARK-35438][SQL] Minor documentation fix for window physical operator

2021-05-18 Thread GitBox



c21 opened a new pull request #32585:
URL: https://github.com/apache/spark/pull/32585


   
   
   ### What changes were proposed in this pull request?
   
   As title. Fixed two places where the documentation for window operator has 
some error.
   
   ### Why are the changes needed?
   
   Help people read code for window operator more easily in the future.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #32469: [SPARK-35338][PYTHON] Separate arithmetic operations into data type based structures

2021-05-18 Thread GitBox



SparkQA commented on pull request #32469:
URL: https://github.com/apache/spark/pull/32469#issuecomment-843650789


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43215/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] PerilousApricot commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-05-18 Thread GitBox



PerilousApricot commented on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-843650178


   Hello, this looks like very good work. I'm having some trouble reading the 
code -- is there a possibility that these UDTs could leverage 
https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extension-types
 when they're in Pandas/Python to skip a costly conversion to Object that 
currently happens?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 507 matches

Mail list logo