[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


zhouyejoe commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922419804


   Thanks for review @mridulm @Ngone51 @venkata91 @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922415249


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143440/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922415249


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143440/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922407063


   **[Test build #143440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)**
 for PR 34033 at commit 
[`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922414280


   **[Test build #143440 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)**
 for PR 34033 at commit 
[`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922411019


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47948/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922411019


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47948/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922410836


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47948/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922410241


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47948/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya closed pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya closed pull request #34037:
URL: https://github.com/apache/spark/pull/34037


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922408200


   Thanks! Merging to 3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922407934


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143439/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922407934


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143439/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922385158


   **[Test build #143439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)**
 for PR 34037 at commit 
[`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922407751


   **[Test build #143439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)**
 for PR 34037 at commit 
[`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN

2021-09-18 Thread GitBox


SparkQA commented on pull request #34033:
URL: https://github.com/apache/spark/pull/34033#issuecomment-922407063


   **[Test build #143440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)**
 for PR 34033 at commit 
[`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #34025: [SPARK-36673][SQL] Fix incorrect schema of nested types of union

2021-09-18 Thread GitBox


viirya commented on a change in pull request #34025:
URL: https://github.com/apache/spark/pull/34025#discussion_r711668057



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala
##
@@ -1018,6 +1018,64 @@ class DataFrameSetOperationsSuite extends QueryTest with 
SharedSparkSession {
 unionDF = df1.unionByName(df2)
 checkAnswer(unionDF, expected)
   }
+
+  test("SPARK-36673: Only merge nullability for Unions of struct") {
+val df1 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS 
INNER")))
+val df2 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS 
inner")))
+
+val union1 = df1.union(df2)
+val union2 = df1.unionByName(df2)
+
+val schema = StructType(Seq(StructField("id", LongType, false),
+  StructField("nested", StructType(Seq(StructField("INNER", LongType, 
false))), false)))
+
+Seq(union1, union2).foreach { df =>
+  assert(df.schema == schema)
+  assert(df.queryExecution.optimizedPlan.schema == schema)
+  assert(df.queryExecution.executedPlan.schema == schema)
+
+  checkAnswer(df, Row(0, Row(0)) :: Row(1, Row(5)) :: Row(0, Row(0)) :: 
Row(1, Row(5)) :: Nil)
+  checkAnswer(df.select("nested.*"), Row(0) :: Row(5) :: Row(0) :: Row(5) 
:: Nil)
+}
+  }
+
+  test("SPARK-36673: Only merge nullability for unionByName of struct") {
+val df1 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS 
INNER")))
+val df2 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS 
inner")))
+
+val df = df1.unionByName(df2)
+
+val schema = StructType(Seq(StructField("id", LongType, false),
+  StructField("nested", StructType(Seq(StructField("INNER", LongType, 
false))), false)))
+
+assert(df.schema == schema)
+assert(df.queryExecution.optimizedPlan.schema == schema)
+assert(df.queryExecution.executedPlan.schema == schema)
+
+checkAnswer(df, Row(0, Row(0)) :: Row(1, Row(5)) :: Row(0, Row(0)) :: 
Row(1, Row(5)) :: Nil)
+checkAnswer(df.select("nested.*"), Row(0) :: Row(5) :: Row(0) :: Row(5) :: 
Nil)
+  }
+
+  test("SPARK-36673: Union of structs with different orders") {
+val df1 = spark.range(2).withColumn("nested",
+  struct(expr("id * 5 AS inner1"), struct(expr("id * 10 AS inner2"
+val df2 = spark.range(2).withColumn("nested",
+  struct(expr("id * 5 AS inner2"), struct(expr("id * 10 AS inner1"

Review comment:
   Opened #34038 for that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922401714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922401714


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143438/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922372717


   **[Test build #143438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)**
 for PR 34038 at commit 
[`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922401123


   **[Test build #143438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)**
 for PR 34038 at commit 
[`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #32818: [SPARK-35592][SQL] An empty dataframe is saved with partitions should write a metadata only file

2021-09-18 Thread GitBox


github-actions[bot] closed pull request #32818:
URL: https://github.com/apache/spark/pull/32818


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #32308: [SPARK-35202] Make The DAG of sql-execution page can be hidden/expand.

2021-09-18 Thread GitBox


github-actions[bot] closed pull request #32308:
URL: https://github.com/apache/spark/pull/32308


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

2021-09-18 Thread GitBox


github-actions[bot] closed pull request #31565:
URL: https://github.com/apache/spark/pull/31565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922390226


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922390226


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922389632


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922389063


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47947/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922386765


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143437/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922386765


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143437/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922356721


   **[Test build #143437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)**
 for PR 34037 at commit 
[`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922386568


   **[Test build #143437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)**
 for PR 34037 at commit 
[`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922385158


   **[Test build #143439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)**
 for PR 34037 at commit 
[`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


mridulm commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922384561


   Thanks for reviewing and merging it @gengliangwang !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


sunchao commented on a change in pull request #34037:
URL: https://github.com/apache/spark/pull/34037#discussion_r711647997



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
##
@@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends 
PrunePartitionSuiteBase {
 }
   }
 
+  test("SPARK-35985 push filters for empty read schema") {

Review comment:
   yea make sense, we can just change the PR title 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922379052


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922379052


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922378826


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922377859


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47946/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya commented on a change in pull request #34037:
URL: https://github.com/apache/spark/pull/34037#discussion_r711643058



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
##
@@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends 
PrunePartitionSuiteBase {
 }
   }
 
+  test("SPARK-35985 push filters for empty read schema") {

Review comment:
   I think we don't need a new JIRA here as this is a backport case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922376012


   Oh, it is SPARK-35985, not SPARK-36776. For backport case, I think we don't 
need a new JIRA.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922375577


   Could you point me to the original PR, I cannot find original commit in 
master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


huaxingao commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922374814


   @viirya yes, it is a backport


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922372717


   **[Test build #143438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)**
 for PR 34038 at commit 
[`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922365573


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143436/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922348986


   **[Test build #143436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)**
 for PR 34038 at commit 
[`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


viirya commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922365581


   Why 3.1 only? Is this a backport?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922365573


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143436/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922365522


   **[Test build #143436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)**
 for PR 34038 at commit 
[`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922364765


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922364765


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922363406


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922362259


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47945/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922356721


   **[Test build #143437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)**
 for PR 34037 at commit 
[`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922356065


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922356065


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922356047


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


huaxingao commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922355965


   cc @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


huaxingao commented on a change in pull request #34037:
URL: https://github.com/apache/spark/pull/34037#discussion_r711627214



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
##
@@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends 
PrunePartitionSuiteBase {
 }
   }
 
+  test("SPARK-35985 push filters for empty read schema") {

Review comment:
   Added. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922354692


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47944/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922348986


   **[Test build #143436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)**
 for PR 34038 at commit 
[`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


sunchao commented on a change in pull request #34037:
URL: https://github.com/apache/spark/pull/34037#discussion_r711621125



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
##
@@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends 
PrunePartitionSuiteBase {
 }
   }
 
+  test("SPARK-35985 push filters for empty read schema") {

Review comment:
   nit: perhaps we should add SPARK-36776 here too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34041: [SPARK-36799][SQL] Pass queryExecution name in CLI when only select query

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34041:
URL: https://github.com/apache/spark/pull/34041#issuecomment-922347374


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cxzl25 opened a new pull request #34041: [SPARK-36799][SQL] Pass queryExecution name in CLI when only select query

2021-09-18 Thread GitBox


cxzl25 opened a new pull request #34041:
URL: https://github.com/apache/spark/pull/34041


   ### What changes were proposed in this pull request?
   When sql is only a select query, call `SQLExecution.withNewExecutionId` and 
specify `collect` as `executionName` so that `QueryExecutionListener` can get 
the query.
   ### Why are the changes needed?
   Now when in spark-sql CLI, `QueryExecutionListener` can receive command , 
but not select query.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   manual test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor commented on pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin when DataFrame has NaN value

2021-09-18 Thread GitBox


dgd-contributor commented on pull request #34040:
URL: https://github.com/apache/spark/pull/34040#issuecomment-922342736


   FYI @ueshin , thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin when DataFrame has NaN value

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34040:
URL: https://github.com/apache/spark/pull/34040#issuecomment-922341600


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor opened a new pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin

2021-09-18 Thread GitBox


dgd-contributor opened a new pull request #34040:
URL: https://github.com/apache/spark/pull/34040


   ### What changes were proposed in this pull request?
   Fix DataFrame.isin when DataFrame has NaN value
   
   ### Why are the changes needed?
   Fix DataFrame.isin when DataFrame has NaN value
   
   ``` python
   >>> psdf = ps.DataFrame(
   ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 
1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
   ... )
   >>> psdf
ab  c   
   
   0  NaN  NaN  1
   1  2.0  5.0  5
   2  3.0  NaN  1
   3  4.0  3.0  3
   4  5.0  2.0  2
   5  6.0  1.0  1
   6  7.0  NaN  1
   7  8.0  0.0  0
   8  NaN  0.0  0
   >>> other = [1, 2, None]
   
   >>> psdf.isin(other)
 a b c
   0  None  None  True
   1  True  None  None
   2  None  None  True
   3  None  None  None
   4  None  True  True
   5  None  True  True
   6  None  None  True
   7  None  None  None
   8  None  None  None
   
   >>> psdf.to_pandas().isin(other)
  a  b  c
   0  False  False   True
   1   True  False  False
   2  False  False   True
   3  False  False  False
   4  False   True   True
   5  False   True   True
   6  False  False   True
   7  False  False  False
   8  False  False  False
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   After this PR
   
   
   ``` python
   >>> psdf = ps.DataFrame(
   ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 
1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]},
   ... )
   >>> psdf
ab  c   
   
   0  NaN  NaN  1
   1  2.0  5.0  5
   2  3.0  NaN  1
   3  4.0  3.0  3
   4  5.0  2.0  2
   5  6.0  1.0  1
   6  7.0  NaN  1
   7  8.0  0.0  0
   8  NaN  0.0  0
   >>> other = [1, 2, None]
   
   >>> psdf.isin(other)
  a  b  c
   0  False  False   True
   1   True  False  False
   2  False  False   True
   3  False  False  False
   4  False   True   True
   5  False   True   True
   6  False  False   True
   7  False  False  False
   8  False  False  False
   ```
   
   ### How was this patch tested?
   Unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zzvara commented on pull request #33630: [SPARK-36408][BUILD] Upgrade json4s to 4.0.3

2021-09-18 Thread GitBox


zzvara commented on pull request #33630:
URL: https://github.com/apache/spark/pull/33630#issuecomment-922336044


   JSON4S 4 quickly gained popularity. By this, we currently fight off 
dependency hell due to the binary incompatibility with Spark. Spark lags behind 
dependency upgrades in many fronts. 95% of our `.scala-steward.conf` ignores 
and pins are due to Spark being added to the project. :-/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922331560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47943/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dgd-contributor commented on a change in pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


dgd-contributor commented on a change in pull request #33858:
URL: https://github.com/apache/spark/pull/33858#discussion_r711607257



##
File path: python/pyspark/pandas/series.py
##
@@ -4463,6 +4466,161 @@ def replace(
 
 return self._with_new_scol(current)  # TODO: dtype?
 
+def combine(
+self,
+other: "Series",
+func: Callable,
+fill_value: Optional[Any] = None,
+) -> "Series":
+"""
+Combine the Series with a Series or scalar according to `func`.
+
+Combine the Series and `other` using `func` to perform elementwise
+selection for combined Series.
+`fill_value` is assumed when value is missing at some index
+from one of the two objects being combined.
+
+.. versionadded:: 3.3.0
+
+.. note:: this API executes the function once to infer the type which 
is
+ potentially expensive, for instance, when the dataset is created 
after
+ aggregations or sorting.
+
+ To avoid this, specify return type in ``func``, for instance, as 
below:
+
+ >>> def foo(x, y) -> np.int32:
+ ... return x * y
+
+ pandas-on-Spark uses return type hint and does not try to infer 
the type.
+
+Parameters
+--
+other : Series or scalar
+The value(s) to be combined with the `Series`.
+func : function
+Function that takes two scalars as inputs and returns an element.
+Note that type hint for return type is required.
+fill_value : scalar, optional
+The value to assume when an index is missing from
+one Series or the other. The default specifies to use the
+appropriate NaN value for the underlying dtype of the Series.
+
+Returns
+---
+Series
+The result of combining the Series with the other object.
+
+See Also
+
+Series.combine_first : Combine Series values, choosing the calling
+Series' values first.
+
+Examples
+
+Consider 2 Datasets ``s1`` and ``s2`` containing
+highest clocked speeds of different birds.
+
+>>> from pyspark.pandas.config import set_option, reset_option
+>>> set_option("compute.ops_on_diff_frames", True)
+>>> s1 = ps.Series({'falcon': 330.0, 'eagle': 160.0})
+>>> s1
+falcon330.0
+eagle 160.0
+dtype: float64
+>>> s2 = ps.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})
+>>> s2
+falcon345.0
+eagle 200.0
+duck   30.0
+dtype: float64
+
+Now, to combine the two datasets and view the highest speeds
+of the birds across the two datasets
+
+>>> s1.combine(s2, max)
+duckNaN
+eagle 200.0
+falcon345.0
+dtype: float64
+
+In the previous example, the resulting value for duck is missing,
+because the maximum of a NaN and a float is a NaN.
+So, in the example, we set ``fill_value=0``,
+so the maximum value returned will be the value from some dataset.
+
+>>> s1.combine(s2, max, fill_value=0)
+duck   30.0
+eagle 200.0
+falcon345.0
+dtype: float64
+>>> reset_option("compute.ops_on_diff_frames")
+"""
+if not isinstance(other, Series) and not np.isscalar(other):
+raise TypeError("unsupported type: %s" % type(other))
+
+assert callable(func), "argument func must be a callable function."
+
+if np.isscalar(other):
+tmp_other_col = 
verify_temp_column_name(self._internal.spark_frame, "__tmp_other_col__")
+combined = self.to_frame()
+combined[tmp_other_col] = other
+combined = DataFrame(combined._internal.resolved_copy)
+elif same_anchor(self, other):
+combined = self._psdf[self._column_label, other._column_label]
+elif fill_value is None:
+combined = combine_frames(self.to_frame(), other.to_frame())
+else:
+combined = self._combine_frame_with_fill_value(other, 
fill_value=fill_value)
+
+try:
+sig_return = infer_return_type(func)
+if isinstance(sig_return, UnknownType):
+raise TypeError()
+return_type = sig_return.spark_type
+except TypeError:
+limit = ps.get_option("compute.shortcut_limit")
+pdf = combined.head(limit + 1)._to_internal_pandas()
+combined_pser = pdf.iloc[:, 0].combine(pdf.iloc[:, 1], func, 
fill_value=fill_value)
+return_type = as_spark_type(combined_pser.dtype)
+
+@pandas_udf(returnType=return_type)  # type: ignore
+def wrapped_func(x: pd.Series, y: pd.Series) -> pd.Series:
+return x.combine(y, func)
+
+

[GitHub] [spark] AmplabJenkins commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922331560


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47943/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


SparkQA commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922331236


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47943/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


SparkQA commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922330151


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47943/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922327826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143435/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922327826


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143435/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922324039


   **[Test build #143435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)**
 for PR 33858 at commit 
[`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


SparkQA commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922327735


   **[Test build #143435 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)**
 for PR 33858 at commit 
[`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

2021-09-18 Thread GitBox


SparkQA commented on pull request #33858:
URL: https://github.com/apache/spark/pull/33858#issuecomment-922324039


   **[Test build #143435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)**
 for PR 33858 at commit 
[`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


Ngone51 commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922282272


   Late lgtm. Thanks @zhouyejoe 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WeichenXu123 commented on a change in pull request #34021: [SPARK-36642][SQL] Add df.withMetadata pyspark API

2021-09-18 Thread GitBox


WeichenXu123 commented on a change in pull request #34021:
URL: https://github.com/apache/spark/pull/34021#discussion_r711587577



##
File path: python/pyspark/sql/dataframe.py
##
@@ -2536,6 +2536,28 @@ def withColumnRenamed(self, existing, new):
 """
 return DataFrame(self._jdf.withColumnRenamed(existing, new), 
self.sql_ctx)
 
+def withMetadata(self, columnName, metadata):
+"""Returns a new :class:`DataFrame` by updating an existing column 
with metadata.
+
+.. versionadded:: 3.3.0
+
+Parameters
+--
+columnName : str
+string, name of the existing column to update the metadata.
+metadata : dict
+dict, new metadata to be assigned to df.schema[columnName].metadata
+
+Examples
+
+>>> df_meta = df.withMetadata('age', {'foo': 'bar'})
+>>> df_meta.schema['age'].metadata
+{'foo': 'bar'}
+"""
+if not isinstance(metadata, dict):

Review comment:
   @HyukjinKwon 
   
   > 
https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L712 
The existing API that takes the metadata also specified that the metadata 
should be a dict in the docstring. However, I'm also fine with not checking the 
dict type.
   
   But in code here 
https://github.com/apache/spark/blob/cabc36b54d7f6633d8b128e511e7049c475b919d/python/pyspark/sql/column.py#L747
 it doesn't require metadata to be dict , so is it a doc error or code error 
there ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922245541


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143432/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922245541


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143432/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922184636


   **[Test build #143432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143432/testReport)**
 for PR 34037 at commit 
[`6ba1b96`](https://github.com/apache/spark/commit/6ba1b9647fffe90c95c6afb54ad19607a6fb7217).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922245222


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143434/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922245222


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143434/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922236467


   **[Test build #143434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)**
 for PR 34038 at commit 
[`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema

2021-09-18 Thread GitBox


SparkQA commented on pull request #34037:
URL: https://github.com/apache/spark/pull/34037#issuecomment-922245207


   **[Test build #143432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143432/testReport)**
 for PR 34037 at commit 
[`6ba1b96`](https://github.com/apache/spark/commit/6ba1b9647fffe90c95c6afb54ad19607a6fb7217).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922245150


   **[Test build #143434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)**
 for PR 34038 at commit 
[`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922243518


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47942/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


AmplabJenkins removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34039: [SPARK-36798]: Wait for listeners to finish before flushing metrics

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34039:
URL: https://github.com/apache/spark/pull/34039#issuecomment-922243538


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


AmplabJenkins commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922243518


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47942/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] BOOTMGR opened a new pull request #34039: [SPARK-36798]: Wait for listeners to finish before flushing metrics

2021-09-18 Thread GitBox


BOOTMGR opened a new pull request #34039:
URL: https://github.com/apache/spark/pull/34039


   ### What changes were proposed in this pull request?
   When `SparkContext` is shutting down, wait for listener bus to finish and 
then only flush `MetricsSystem`.
   
   
   ### Why are the changes needed?
   In current implementation, when `SparkContext.stop()` is called, 
`metricsSystem.report()` is called before `listenerBus.stop()`. In this case, 
if some listener is producing some metrics, they would never reach sink.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   NA
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922242583


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47942/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922241502


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47942/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


SparkQA removed a comment on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922208782


   **[Test build #143433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)**
 for PR 34018 at commit 
[`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching

2021-09-18 Thread GitBox


SparkQA commented on pull request #34018:
URL: https://github.com/apache/spark/pull/34018#issuecomment-922240171


   **[Test build #143433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)**
 for PR 34018 at commit 
[`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


SparkQA commented on pull request #34038:
URL: https://github.com/apache/spark/pull/34038#issuecomment-922236467


   **[Test build #143434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)**
 for PR 34038 at commit 
[`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns

2021-09-18 Thread GitBox


viirya commented on a change in pull request #34038:
URL: https://github.com/apache/spark/pull/34038#discussion_r711537729



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -401,16 +401,30 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 |the ${ordinalNumber(ti + 1)} table has 
${child.output.length} columns
   """.stripMargin.replace("\n", " ").trim())
   }
+  val isUnion = operator.isInstanceOf[Union]
   // Check if the data types match.
-  dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, 
dt2), ci) =>
-// SPARK-18058: we shall not care about the nullability of 
columns
-if (TypeCoercion.findWiderTypeForTwo(dt1.asNullable, 
dt2.asNullable).isEmpty) {
-  failAnalysis(
-s"""
-  |${operator.nodeName} can only be performed on tables 
with the compatible
-  |column types. ${dt1.catalogString} <> 
${dt2.catalogString} at the
-  |${ordinalNumber(ci)} column of the ${ordinalNumber(ti + 
1)} table
-""".stripMargin.replace("\n", " ").trim())
+  if (!isUnion) {

Review comment:
   Not sure if we should also generalize to all set operations? Although it 
looks reasonable, but by their API definition seems we don't have the 
by-position definition as Union.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >