[GitHub] [spark] zhouyejoe commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
zhouyejoe commented on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922419804 Thanks for review @mridulm @Ngone51 @venkata91 @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
AmplabJenkins removed a comment on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922415249 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143440/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
AmplabJenkins commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922415249 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143440/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
SparkQA removed a comment on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922407063 **[Test build #143440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)** for PR 34033 at commit [`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
SparkQA commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922414280 **[Test build #143440 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)** for PR 34033 at commit [`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
AmplabJenkins removed a comment on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922411019 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47948/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
AmplabJenkins commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922411019 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47948/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
SparkQA commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922410836 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47948/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
SparkQA commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922410241 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47948/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
viirya closed pull request #34037: URL: https://github.com/apache/spark/pull/34037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
viirya commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922408200 Thanks! Merging to 3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922407934 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143439/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922407934 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143439/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922385158 **[Test build #143439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)** for PR 34037 at commit [`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922407751 **[Test build #143439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)** for PR 34037 at commit [`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34033: [SPARK-36792][SQL] InSet should handle NaN
SparkQA commented on pull request #34033: URL: https://github.com/apache/spark/pull/34033#issuecomment-922407063 **[Test build #143440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143440/testReport)** for PR 34033 at commit [`141dc5f`](https://github.com/apache/spark/commit/141dc5f40a2d81aeee034dfef661db1618bb8e1a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #34025: [SPARK-36673][SQL] Fix incorrect schema of nested types of union
viirya commented on a change in pull request #34025: URL: https://github.com/apache/spark/pull/34025#discussion_r711668057 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala ## @@ -1018,6 +1018,64 @@ class DataFrameSetOperationsSuite extends QueryTest with SharedSparkSession { unionDF = df1.unionByName(df2) checkAnswer(unionDF, expected) } + + test("SPARK-36673: Only merge nullability for Unions of struct") { +val df1 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS INNER"))) +val df2 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS inner"))) + +val union1 = df1.union(df2) +val union2 = df1.unionByName(df2) + +val schema = StructType(Seq(StructField("id", LongType, false), + StructField("nested", StructType(Seq(StructField("INNER", LongType, false))), false))) + +Seq(union1, union2).foreach { df => + assert(df.schema == schema) + assert(df.queryExecution.optimizedPlan.schema == schema) + assert(df.queryExecution.executedPlan.schema == schema) + + checkAnswer(df, Row(0, Row(0)) :: Row(1, Row(5)) :: Row(0, Row(0)) :: Row(1, Row(5)) :: Nil) + checkAnswer(df.select("nested.*"), Row(0) :: Row(5) :: Row(0) :: Row(5) :: Nil) +} + } + + test("SPARK-36673: Only merge nullability for unionByName of struct") { +val df1 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS INNER"))) +val df2 = spark.range(2).withColumn("nested", struct(expr("id * 5 AS inner"))) + +val df = df1.unionByName(df2) + +val schema = StructType(Seq(StructField("id", LongType, false), + StructField("nested", StructType(Seq(StructField("INNER", LongType, false))), false))) + +assert(df.schema == schema) +assert(df.queryExecution.optimizedPlan.schema == schema) +assert(df.queryExecution.executedPlan.schema == schema) + +checkAnswer(df, Row(0, Row(0)) :: Row(1, Row(5)) :: Row(0, Row(0)) :: Row(1, Row(5)) :: Nil) +checkAnswer(df.select("nested.*"), Row(0) :: Row(5) :: Row(0) :: Row(5) :: Nil) + } + + test("SPARK-36673: Union of structs with different orders") { +val df1 = spark.range(2).withColumn("nested", + struct(expr("id * 5 AS inner1"), struct(expr("id * 10 AS inner2" +val df2 = spark.range(2).withColumn("nested", + struct(expr("id * 5 AS inner2"), struct(expr("id * 10 AS inner1" Review comment: Opened #34038 for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922401714 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143438/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922401714 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143438/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922372717 **[Test build #143438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)** for PR 34038 at commit [`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922401123 **[Test build #143438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)** for PR 34038 at commit [`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #32818: [SPARK-35592][SQL] An empty dataframe is saved with partitions should write a metadata only file
github-actions[bot] closed pull request #32818: URL: https://github.com/apache/spark/pull/32818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #32308: [SPARK-35202] Make The DAG of sql-execution page can be hidden/expand.
github-actions[bot] closed pull request #32308: URL: https://github.com/apache/spark/pull/32308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI
github-actions[bot] closed pull request #31565: URL: https://github.com/apache/spark/pull/31565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922390226 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922390226 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922389632 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922389063 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47947/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922386765 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143437/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922386765 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143437/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922356721 **[Test build #143437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)** for PR 34037 at commit [`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922386568 **[Test build #143437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)** for PR 34037 at commit [`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922385158 **[Test build #143439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143439/testReport)** for PR 34037 at commit [`a69b74f`](https://github.com/apache/spark/commit/a69b74f2bb2530d0706777f17e8e2e6906a21079). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
mridulm commented on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922384561 Thanks for reviewing and merging it @gengliangwang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
sunchao commented on a change in pull request #34037: URL: https://github.com/apache/spark/pull/34037#discussion_r711647997 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala ## @@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends PrunePartitionSuiteBase { } } + test("SPARK-35985 push filters for empty read schema") { Review comment: yea make sense, we can just change the PR title -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922379052 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47946/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922379052 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47946/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922378826 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47946/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922377859 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47946/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #34037: [SPARK-35985][SQL][3.1] push partitionFilters for empty readDataSchema
viirya commented on a change in pull request #34037: URL: https://github.com/apache/spark/pull/34037#discussion_r711643058 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala ## @@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends PrunePartitionSuiteBase { } } + test("SPARK-35985 push filters for empty read schema") { Review comment: I think we don't need a new JIRA here as this is a backport case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
viirya commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922376012 Oh, it is SPARK-35985, not SPARK-36776. For backport case, I think we don't need a new JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
viirya commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922375577 Could you point me to the original PR, I cannot find original commit in master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
huaxingao commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922374814 @viirya yes, it is a backport -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922372717 **[Test build #143438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143438/testReport)** for PR 34038 at commit [`47bec5d`](https://github.com/apache/spark/commit/47bec5d1449d2aafe27d8ed7febc7d0fbe0e9add). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922365573 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143436/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922348986 **[Test build #143436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)** for PR 34038 at commit [`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
viirya commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922365581 Why 3.1 only? Is this a backport? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922365573 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143436/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922365522 **[Test build #143436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)** for PR 34038 at commit [`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922364765 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922364765 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922363406 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922362259 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47945/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922356721 **[Test build #143437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143437/testReport)** for PR 34037 at commit [`f2e2adb`](https://github.com/apache/spark/commit/f2e2adb4602b7096c91f31d811b9170662a2a515). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922356065 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47944/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922356065 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47944/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922356047 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47944/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
huaxingao commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922355965 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
huaxingao commented on a change in pull request #34037: URL: https://github.com/apache/spark/pull/34037#discussion_r711627214 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala ## @@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends PrunePartitionSuiteBase { } } + test("SPARK-35985 push filters for empty read schema") { Review comment: Added. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922354692 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47944/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922348986 **[Test build #143436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143436/testReport)** for PR 34038 at commit [`b91c08e`](https://github.com/apache/spark/commit/b91c08e6dda6d7f9b1e74d6b54146aa86b6b35a5). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
sunchao commented on a change in pull request #34037: URL: https://github.com/apache/spark/pull/34037#discussion_r711621125 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala ## @@ -109,9 +110,26 @@ class PruneFileSourcePartitionsSuite extends PrunePartitionSuiteBase { } } + test("SPARK-35985 push filters for empty read schema") { Review comment: nit: perhaps we should add SPARK-36776 here too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34041: [SPARK-36799][SQL] Pass queryExecution name in CLI when only select query
AmplabJenkins commented on pull request #34041: URL: https://github.com/apache/spark/pull/34041#issuecomment-922347374 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cxzl25 opened a new pull request #34041: [SPARK-36799][SQL] Pass queryExecution name in CLI when only select query
cxzl25 opened a new pull request #34041: URL: https://github.com/apache/spark/pull/34041 ### What changes were proposed in this pull request? When sql is only a select query, call `SQLExecution.withNewExecutionId` and specify `collect` as `executionName` so that `QueryExecutionListener` can get the query. ### Why are the changes needed? Now when in spark-sql CLI, `QueryExecutionListener` can receive command , but not select query. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? manual test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor commented on pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin when DataFrame has NaN value
dgd-contributor commented on pull request #34040: URL: https://github.com/apache/spark/pull/34040#issuecomment-922342736 FYI @ueshin , thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin when DataFrame has NaN value
AmplabJenkins commented on pull request #34040: URL: https://github.com/apache/spark/pull/34040#issuecomment-922341600 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor opened a new pull request #34040: [SPARK-36785][PYTHON] fix DataFrame.isin
dgd-contributor opened a new pull request #34040: URL: https://github.com/apache/spark/pull/34040 ### What changes were proposed in this pull request? Fix DataFrame.isin when DataFrame has NaN value ### Why are the changes needed? Fix DataFrame.isin when DataFrame has NaN value ``` python >>> psdf = ps.DataFrame( ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]}, ... ) >>> psdf ab c 0 NaN NaN 1 1 2.0 5.0 5 2 3.0 NaN 1 3 4.0 3.0 3 4 5.0 2.0 2 5 6.0 1.0 1 6 7.0 NaN 1 7 8.0 0.0 0 8 NaN 0.0 0 >>> other = [1, 2, None] >>> psdf.isin(other) a b c 0 None None True 1 True None None 2 None None True 3 None None None 4 None True True 5 None True True 6 None None True 7 None None None 8 None None None >>> psdf.to_pandas().isin(other) a b c 0 False False True 1 True False False 2 False False True 3 False False False 4 False True True 5 False True True 6 False False True 7 False False False 8 False False False ``` ### Does this PR introduce _any_ user-facing change? After this PR ``` python >>> psdf = ps.DataFrame( ... {"a": [None, 2, 3, 4, 5, 6, 7, 8, None], "b": [None, 5, None, 3, 2, 1, None, 0, 0], "c": [1, 5, 1, 3, 2, 1, 1, 0, 0]}, ... ) >>> psdf ab c 0 NaN NaN 1 1 2.0 5.0 5 2 3.0 NaN 1 3 4.0 3.0 3 4 5.0 2.0 2 5 6.0 1.0 1 6 7.0 NaN 1 7 8.0 0.0 0 8 NaN 0.0 0 >>> other = [1, 2, None] >>> psdf.isin(other) a b c 0 False False True 1 True False False 2 False False True 3 False False False 4 False True True 5 False True True 6 False False True 7 False False False 8 False False False ``` ### How was this patch tested? Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zzvara commented on pull request #33630: [SPARK-36408][BUILD] Upgrade json4s to 4.0.3
zzvara commented on pull request #33630: URL: https://github.com/apache/spark/pull/33630#issuecomment-922336044 JSON4S 4 quickly gained popularity. By this, we currently fight off dependency hell due to the binary incompatibility with Spark. Spark lags behind dependency upgrades in many fronts. 95% of our `.scala-steward.conf` ignores and pins are due to Spark being added to the project. :-/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
AmplabJenkins removed a comment on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922331560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47943/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dgd-contributor commented on a change in pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
dgd-contributor commented on a change in pull request #33858: URL: https://github.com/apache/spark/pull/33858#discussion_r711607257 ## File path: python/pyspark/pandas/series.py ## @@ -4463,6 +4466,161 @@ def replace( return self._with_new_scol(current) # TODO: dtype? +def combine( +self, +other: "Series", +func: Callable, +fill_value: Optional[Any] = None, +) -> "Series": +""" +Combine the Series with a Series or scalar according to `func`. + +Combine the Series and `other` using `func` to perform elementwise +selection for combined Series. +`fill_value` is assumed when value is missing at some index +from one of the two objects being combined. + +.. versionadded:: 3.3.0 + +.. note:: this API executes the function once to infer the type which is + potentially expensive, for instance, when the dataset is created after + aggregations or sorting. + + To avoid this, specify return type in ``func``, for instance, as below: + + >>> def foo(x, y) -> np.int32: + ... return x * y + + pandas-on-Spark uses return type hint and does not try to infer the type. + +Parameters +-- +other : Series or scalar +The value(s) to be combined with the `Series`. +func : function +Function that takes two scalars as inputs and returns an element. +Note that type hint for return type is required. +fill_value : scalar, optional +The value to assume when an index is missing from +one Series or the other. The default specifies to use the +appropriate NaN value for the underlying dtype of the Series. + +Returns +--- +Series +The result of combining the Series with the other object. + +See Also + +Series.combine_first : Combine Series values, choosing the calling +Series' values first. + +Examples + +Consider 2 Datasets ``s1`` and ``s2`` containing +highest clocked speeds of different birds. + +>>> from pyspark.pandas.config import set_option, reset_option +>>> set_option("compute.ops_on_diff_frames", True) +>>> s1 = ps.Series({'falcon': 330.0, 'eagle': 160.0}) +>>> s1 +falcon330.0 +eagle 160.0 +dtype: float64 +>>> s2 = ps.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0}) +>>> s2 +falcon345.0 +eagle 200.0 +duck 30.0 +dtype: float64 + +Now, to combine the two datasets and view the highest speeds +of the birds across the two datasets + +>>> s1.combine(s2, max) +duckNaN +eagle 200.0 +falcon345.0 +dtype: float64 + +In the previous example, the resulting value for duck is missing, +because the maximum of a NaN and a float is a NaN. +So, in the example, we set ``fill_value=0``, +so the maximum value returned will be the value from some dataset. + +>>> s1.combine(s2, max, fill_value=0) +duck 30.0 +eagle 200.0 +falcon345.0 +dtype: float64 +>>> reset_option("compute.ops_on_diff_frames") +""" +if not isinstance(other, Series) and not np.isscalar(other): +raise TypeError("unsupported type: %s" % type(other)) + +assert callable(func), "argument func must be a callable function." + +if np.isscalar(other): +tmp_other_col = verify_temp_column_name(self._internal.spark_frame, "__tmp_other_col__") +combined = self.to_frame() +combined[tmp_other_col] = other +combined = DataFrame(combined._internal.resolved_copy) +elif same_anchor(self, other): +combined = self._psdf[self._column_label, other._column_label] +elif fill_value is None: +combined = combine_frames(self.to_frame(), other.to_frame()) +else: +combined = self._combine_frame_with_fill_value(other, fill_value=fill_value) + +try: +sig_return = infer_return_type(func) +if isinstance(sig_return, UnknownType): +raise TypeError() +return_type = sig_return.spark_type +except TypeError: +limit = ps.get_option("compute.shortcut_limit") +pdf = combined.head(limit + 1)._to_internal_pandas() +combined_pser = pdf.iloc[:, 0].combine(pdf.iloc[:, 1], func, fill_value=fill_value) +return_type = as_spark_type(combined_pser.dtype) + +@pandas_udf(returnType=return_type) # type: ignore +def wrapped_func(x: pd.Series, y: pd.Series) -> pd.Series: +return x.combine(y, func) + +
[GitHub] [spark] AmplabJenkins commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
AmplabJenkins commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922331560 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47943/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
SparkQA commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922331236 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47943/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
SparkQA commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922330151 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47943/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
AmplabJenkins removed a comment on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922327826 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143435/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
AmplabJenkins commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922327826 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143435/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
SparkQA removed a comment on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922324039 **[Test build #143435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)** for PR 33858 at commit [`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
SparkQA commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922327735 **[Test build #143435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)** for PR 33858 at commit [`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine
SparkQA commented on pull request #33858: URL: https://github.com/apache/spark/pull/33858#issuecomment-922324039 **[Test build #143435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143435/testReport)** for PR 33858 at commit [`d5f894e`](https://github.com/apache/spark/commit/d5f894eed43d9b0b4f41c846c7c2aca25a74c2dd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
Ngone51 commented on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922282272 Late lgtm. Thanks @zhouyejoe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a change in pull request #34021: [SPARK-36642][SQL] Add df.withMetadata pyspark API
WeichenXu123 commented on a change in pull request #34021: URL: https://github.com/apache/spark/pull/34021#discussion_r711587577 ## File path: python/pyspark/sql/dataframe.py ## @@ -2536,6 +2536,28 @@ def withColumnRenamed(self, existing, new): """ return DataFrame(self._jdf.withColumnRenamed(existing, new), self.sql_ctx) +def withMetadata(self, columnName, metadata): +"""Returns a new :class:`DataFrame` by updating an existing column with metadata. + +.. versionadded:: 3.3.0 + +Parameters +-- +columnName : str +string, name of the existing column to update the metadata. +metadata : dict +dict, new metadata to be assigned to df.schema[columnName].metadata + +Examples + +>>> df_meta = df.withMetadata('age', {'foo': 'bar'}) +>>> df_meta.schema['age'].metadata +{'foo': 'bar'} +""" +if not isinstance(metadata, dict): Review comment: @HyukjinKwon > https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L712 The existing API that takes the metadata also specified that the metadata should be a dict in the docstring. However, I'm also fine with not checking the dict type. But in code here https://github.com/apache/spark/blob/cabc36b54d7f6633d8b128e511e7049c475b919d/python/pyspark/sql/column.py#L747 it doesn't require metadata to be dict , so is it a doc error or code error there ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922245541 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143432/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
AmplabJenkins commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922245541 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143432/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA removed a comment on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922184636 **[Test build #143432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143432/testReport)** for PR 34037 at commit [`6ba1b96`](https://github.com/apache/spark/commit/6ba1b9647fffe90c95c6afb54ad19607a6fb7217). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922245222 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143434/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922245222 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143434/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922236467 **[Test build #143434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)** for PR 34038 at commit [`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34037: [SPARK-36776][SQL][3.1] push partitionFilters for empty readDataSchema
SparkQA commented on pull request #34037: URL: https://github.com/apache/spark/pull/34037#issuecomment-922245207 **[Test build #143432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143432/testReport)** for PR 34037 at commit [`6ba1b96`](https://github.com/apache/spark/commit/6ba1b9647fffe90c95c6afb54ad19607a6fb7217). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922245150 **[Test build #143434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)** for PR 34038 at commit [`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins removed a comment on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922243518 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47942/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
AmplabJenkins removed a comment on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34039: [SPARK-36798]: Wait for listeners to finish before flushing metrics
AmplabJenkins commented on pull request #34039: URL: https://github.com/apache/spark/pull/34039#issuecomment-922243538 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
AmplabJenkins commented on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922243519 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143433/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
AmplabJenkins commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922243518 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47942/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BOOTMGR opened a new pull request #34039: [SPARK-36798]: Wait for listeners to finish before flushing metrics
BOOTMGR opened a new pull request #34039: URL: https://github.com/apache/spark/pull/34039 ### What changes were proposed in this pull request? When `SparkContext` is shutting down, wait for listener bus to finish and then only flush `MetricsSystem`. ### Why are the changes needed? In current implementation, when `SparkContext.stop()` is called, `metricsSystem.report()` is called before `listenerBus.stop()`. In this case, if some listener is producing some metrics, they would never reach sink. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? NA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922242583 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47942/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922241502 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47942/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
SparkQA removed a comment on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922208782 **[Test build #143433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)** for PR 34018 at commit [`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34018: [SPARK-36772] FinalizeShuffleMerge fails with an exception due to attempt id not matching
SparkQA commented on pull request #34018: URL: https://github.com/apache/spark/pull/34018#issuecomment-922240171 **[Test build #143433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143433/testReport)** for PR 34018 at commit [`f6e47b8`](https://github.com/apache/spark/commit/f6e47b8108e458b0057dd20554876dbf79b93e37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
SparkQA commented on pull request #34038: URL: https://github.com/apache/spark/pull/34038#issuecomment-922236467 **[Test build #143434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143434/testReport)** for PR 34038 at commit [`f9c133c`](https://github.com/apache/spark/commit/f9c133c57a27ae9789d6c4382c11f369717b15e7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #34038: [SPARK-36797][SQL] Union should resolve nested columns as top-level columns
viirya commented on a change in pull request #34038: URL: https://github.com/apache/spark/pull/34038#discussion_r711537729 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -401,16 +401,30 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { |the ${ordinalNumber(ti + 1)} table has ${child.output.length} columns """.stripMargin.replace("\n", " ").trim()) } + val isUnion = operator.isInstanceOf[Union] // Check if the data types match. - dataTypes(child).zip(ref).zipWithIndex.foreach { case ((dt1, dt2), ci) => -// SPARK-18058: we shall not care about the nullability of columns -if (TypeCoercion.findWiderTypeForTwo(dt1.asNullable, dt2.asNullable).isEmpty) { - failAnalysis( -s""" - |${operator.nodeName} can only be performed on tables with the compatible - |column types. ${dt1.catalogString} <> ${dt2.catalogString} at the - |${ordinalNumber(ci)} column of the ${ordinalNumber(ti + 1)} table -""".stripMargin.replace("\n", " ").trim()) + if (!isUnion) { Review comment: Not sure if we should also generalize to all set operations? Although it looks reasonable, but by their API definition seems we don't have the by-position definition as Union. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org