[GitHub] [spark] dongjoon-hyun commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
dongjoon-hyun commented on pull request #29755: URL: https://github.com/apache/spark/pull/29755#issuecomment-692478640 Thank you, @HyukjinKwon ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
AmplabJenkins commented on pull request #29743: URL: https://github.com/apache/spark/pull/29743#issuecomment-692478417 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
AmplabJenkins removed a comment on pull request #29743: URL: https://github.com/apache/spark/pull/29743#issuecomment-692478417 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
HyukjinKwon commented on pull request #29755: URL: https://github.com/apache/spark/pull/29755#issuecomment-692476778 This is nice. We should mark more tests. Merged to master. I manually checked the logs and see it's properly excluded/included. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
HyukjinKwon closed pull request #29755: URL: https://github.com/apache/spark/pull/29755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
AmplabJenkins removed a comment on pull request #29734: URL: https://github.com/apache/spark/pull/29734#issuecomment-692475975 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128675/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
AmplabJenkins removed a comment on pull request #29734: URL: https://github.com/apache/spark/pull/29734#issuecomment-692475971 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
SparkQA commented on pull request #29755: URL: https://github.com/apache/spark/pull/29755#issuecomment-692476130 **[Test build #128695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128695/testReport)** for PR 29755 at commit [`0b2483d`](https://github.com/apache/spark/commit/0b2483d37c8f74ab76d2bad2deb2ac992ee1d1a3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
AmplabJenkins removed a comment on pull request #29755: URL: https://github.com/apache/spark/pull/29755#issuecomment-692455851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
AmplabJenkins commented on pull request #29734: URL: https://github.com/apache/spark/pull/29734#issuecomment-692475971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
SparkQA removed a comment on pull request #29734: URL: https://github.com/apache/spark/pull/29734#issuecomment-692377261 **[Test build #128675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128675/testReport)** for PR 29734 at commit [`06ca9c1`](https://github.com/apache/spark/commit/06ca9c17da310c88c67204366bf15c1e80057719). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
SparkQA commented on pull request #29734: URL: https://github.com/apache/spark/pull/29734#issuecomment-692474596 **[Test build #128675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128675/testReport)** for PR 29734 at commit [`06ca9c1`](https://github.com/apache/spark/commit/06ca9c17da310c88c67204366bf15c1e80057719). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
AmplabJenkins removed a comment on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692469923 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128689/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
AmplabJenkins removed a comment on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692469917 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the ge
AmplabJenkins removed a comment on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-692451753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-692470147 **[Test build #128694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128694/testReport)** for PR 29754 at commit [`4240990`](https://github.com/apache/spark/commit/42409905619a5035d962a207f5702e1e2a63c739). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
AmplabJenkins commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692469917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
AmplabJenkins commented on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692469529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
AmplabJenkins removed a comment on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692469529 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
SparkQA removed a comment on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692418147 **[Test build #128689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)** for PR 29750 at commit [`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
SparkQA commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692469274 **[Test build #128689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)** for PR 29750 at commit [`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission
SparkQA commented on pull request #29722: URL: https://github.com/apache/spark/pull/29722#issuecomment-692455812 **[Test build #128693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128693/testReport)** for PR 29722 at commit [`f64063e`](https://github.com/apache/spark/commit/f64063ed7f3ac2851ddfbdf92b4961fbd1a192c6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
AmplabJenkins removed a comment on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-692455714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
AmplabJenkins commented on pull request #29755: URL: https://github.com/apache/spark/pull/29755#issuecomment-692455851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
AmplabJenkins commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-692455714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-692454729 **[Test build #128673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128673/testReport)** for PR 29695 at commit [`c45a2b6`](https://github.com/apache/spark/commit/c45a2b643f2e83af14f6b0584c5c76cf1e5af4c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29695: [SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down
SparkQA removed a comment on pull request #29695: URL: https://github.com/apache/spark/pull/29695#issuecomment-692343374 **[Test build #128673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128673/testReport)** for PR 29695 at commit [`c45a2b6`](https://github.com/apache/spark/commit/c45a2b643f2e83af14f6b0584c5c76cf1e5af4c0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #29755: [SPARK-32884][TESTS] Mark TPCDSQuery*Suite as ExtendedSQLTest
dongjoon-hyun opened a new pull request #29755: URL: https://github.com/apache/spark/pull/29755 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
dongjoon-hyun commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692452696 Thank you, @Ngone51 and all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
dongjoon-hyun closed pull request #29750: URL: https://github.com/apache/spark/pull/29750 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
dongjoon-hyun commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692452395 Merged to master for Apache Spark 3.1.0 on December 2020. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
AmplabJenkins removed a comment on pull request #29558: URL: https://github.com/apache/spark/pull/29558#issuecomment-692451876 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
SparkQA commented on pull request #29743: URL: https://github.com/apache/spark/pull/29743#issuecomment-692452000 **[Test build #128692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128692/testReport)** for PR 29743 at commit [`499c32c`](https://github.com/apache/spark/commit/499c32c16bdbf6e08b792f027c6e8c943c68eacb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general me
AmplabJenkins commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-692451753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
AmplabJenkins commented on pull request #29558: URL: https://github.com/apache/spark/pull/29558#issuecomment-692451876 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
SparkQA commented on pull request #29558: URL: https://github.com/apache/spark/pull/29558#issuecomment-692451008 **[Test build #128684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128684/testReport)** for PR 29558 at commit [`b565c33`](https://github.com/apache/spark/commit/b565c338390b9c54404bf55572b13e6a9c656e7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
SparkQA removed a comment on pull request #29558: URL: https://github.com/apache/spark/pull/29558#issuecomment-692391473 **[Test build #128684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128684/testReport)** for PR 29558 at commit [`b565c33`](https://github.com/apache/spark/commit/b565c338390b9c54404bf55572b13e6a9c656e7d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request #29754: [WIP][SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
beliefer opened a new pull request #29754: URL: https://github.com/apache/spark/pull/29754 ### What changes were proposed in this pull request? TaskSchedulerImplSuite always check the results show below: ``` val zeroCoreWorkerOffers = IndexedSeq(new WorkerOffer("executor0", "host0", 0), new WorkerOffer("executor1", "host1", 0)) val taskSet = FakeTask.createTaskSet(1) taskScheduler.submitTasks(taskSet) var taskDescriptions = taskScheduler.resourceOffers(zeroCoreWorkerOffers).flatten assert(0 === taskDescriptions.length) ``` We can extract it as a generic method. ### Why are the changes needed? Extract a generic method. ### Does this PR introduce _any_ user-facing change? 'No'. ### How was this patch tested? Jenkins test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission
AmplabJenkins removed a comment on pull request #29722: URL: https://github.com/apache/spark/pull/29722#issuecomment-692446530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission
AmplabJenkins commented on pull request #29722: URL: https://github.com/apache/spark/pull/29722#issuecomment-692446530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
AmplabJenkins commented on pull request #29743: URL: https://github.com/apache/spark/pull/29743#issuecomment-692444719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
AmplabJenkins removed a comment on pull request #29743: URL: https://github.com/apache/spark/pull/29743#issuecomment-692444719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29734: [SPARK-32861][SQL] GenerateExec should require column ordering
viirya commented on a change in pull request #29734: URL: https://github.com/apache/spark/pull/29734#discussion_r488359058 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala ## @@ -115,9 +119,40 @@ class RemoveRedundantProjectsSuite extends QueryTest with SharedSparkSession wit assertProjectExec(query, 1, 2) } - test("generate") { -val query = "select a, key, explode(d) from testView where a > 10" -assertProjectExec(query, 0, 1) + test("generate should require column ordering") { +withTempView("testData") { + spark.range(0, 10, 1) +.selectExpr("id as key", "id * 2 as a", "id * 3 as b") +.createOrReplaceTempView("testData") + + val data = sql("select key, a, b, count(*) from testData group by key, a, b limit 2") + val df = data.selectExpr("a", "b", "key", "explode(array(key, a, b)) as d").filter("d > 0") + df.collect() + val plan = df.queryExecution.executedPlan + val numProjects = collectWithSubqueries(plan) { case p: ProjectExec => p }.length + + // Create a new plan that reverse the GenerateExec output and add a new ProjectExec between + // GenerateExec and its child. This is to test if the ProjectExec is removed, the output of + // the query will be incorrect. + val newPlan = stripAQEPlan(plan) transform { +case g @ GenerateExec(_, requiredChildOutput, _, _, child) => + g.copy(requiredChildOutput = requiredChildOutput.reverse, +child = ProjectExec(requiredChildOutput.reverse, child)) + } + + // Re-apply remove redundant project rule. + val rule = RemoveRedundantProjects(spark.sessionState.conf) + val newExecutedPlan = rule.apply(newPlan) + // The manually added ProjectExec node shouldn't be removed. + assert(collectWithSubqueries(newExecutedPlan) { +case p: ProjectExec => p }.size == numProjects + 1) Review comment: The style looks weird. ```scala assert(collectWithSubqueries(newExecutedPlan) { case p: ProjectExec => p }.size == numProjects + 1) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29743: [SPARK-32870][DOCS][SQL] Make sure that all expressions have their ExpressionDescription filled
tanelk commented on a change in pull request #29743: URL: https://github.com/apache/spark/pull/29743#discussion_r488356430 ## File path: sql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala ## @@ -105,11 +105,34 @@ class ExpressionInfoSuite extends SparkFunSuite with SharedSparkSession { } } + test("SPARK-32870: Default expressions in FunctionRegistry should have their " + +"usage, examples and since filled") { +val ignoreSet = Set( + "org.apache.spark.sql.catalyst.expressions.TimeWindow") + +spark.sessionState.functionRegistry.listFunction().foreach { funcId => + val info = spark.sessionState.catalog.lookupFunctionInfo(funcId) + if (!ignoreSet.contains(info.getClassName)) { +withClue(s"Function '${info.getName}', Expression class '${info.getClassName}'") { + assert(info.getUsage.nonEmpty) + assert(info.getExamples.startsWith("\nExamples:\n")) + assert(info.getExamples.endsWith("\n ")) + assert(info.getSince.matches("[0-9]+\\.[0-9]+\\.[0-9]+")) Review comment: These ones are a bit more strict. For example this extra space made the Example for `typeof` not appear in http://spark.apache.org/docs/latest/api/sql/#typeof https://github.com/apache/spark/pull/29743/files#diff-25282ab1377a3d87999d1d0d7a8ec270R208-R214 I didn't want to add these checks to the constructor, because I'm afraid that they might break some UDFs and also I have no way to exclude some from the checks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692432165 > > @cloud-fan, now that #29647 is merged, can this be merged also? > > Are all the bugs that this PR found already fixed now? I believe, that they were the manifestation of the `0.0 != -0.0` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
SparkQA commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692428920 **[Test build #128691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128691/testReport)** for PR 29739 at commit [`099c232`](https://github.com/apache/spark/commit/099c232c0cf3904654589b4a7b2c51c52c5e000b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZh commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692428424 > Please update the PR description about how-to-fix. Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29688: [SPARK-32827][SQL] Add spark.sql.maxMetadataStringLength config
AngersZh commented on pull request #29688: URL: https://github.com/apache/spark/pull/29688#issuecomment-692427476 > @ulysses-you thanks for the work. > It seems that the PR only changes the file source related metadata. Is there other places we can use the new config as well? I will use this config in my pr https://github.com/apache/spark/pull/29739. Similar work in my pr and reverted waiting this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692427206 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins removed a comment on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692426663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins removed a comment on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692426652 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
SparkQA removed a comment on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692420232 **[Test build #128690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)** for PR 29739 at commit [`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692426652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
SparkQA commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692426576 **[Test build #128690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)** for PR 29739 at commit [`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29722: [SPARK-32850][CORE] Simplify the RPC message flow of decommission
Ngone51 commented on a change in pull request #29722: URL: https://github.com/apache/spark/pull/29722#discussion_r488343168 ## File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala ## @@ -79,12 +79,23 @@ private[spark] class CoarseGrainedExecutorBackend( */ private[executor] val taskResources = new mutable.HashMap[Long, Map[String, ResourceInformation]] - @volatile private var decommissioned = false + private var decommissioned = false override def onStart(): Unit = { -logInfo("Registering PWR handler.") -SignalUtils.register("PWR", "Failed to register SIGPWR handler - " + - "disabling decommission feature.")(decommissionSelf) +if (env.conf.get(DECOMMISSION_ENABLED)) { + logInfo("Registering PWR handler to trigger decommissioning.") + SignalUtils.register("PWR", "Failed to register SIGPWR handler - " + + "disabling executor decommission feature.") { +self.send(DecommissionExecutor) Review comment: I'm ok with adding a new mesage. WDYT? @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins removed a comment on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692420551 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AmplabJenkins commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692420551 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
SparkQA commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692420232 **[Test build #128690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128690/testReport)** for PR 29739 at commit [`543b853`](https://github.com/apache/spark/commit/543b85347d09bf1d254486c40eba6809df9881f6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZh commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692420213 > Could you add tests, too? Sure, just check HiveTableScanExec's simpleString will be ok? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
maropu commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692419605 Could you add tests, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
maropu commented on a change in pull request #29739: URL: https://github.com/apache/spark/pull/29739#discussion_r488340537 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode { // Metadata that describes more details of this scan. protected def metadata: Map[String, String] - protected val maxMetadataValueLength = 100 + protected val maxMetadataValueLength = conf.maxMetadataValueLength Review comment: link: https://github.com/apache/spark/pull/29688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZh commented on a change in pull request #29739: URL: https://github.com/apache/spark/pull/29739#discussion_r488340103 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode { // Metadata that describes more details of this scan. protected def metadata: Map[String, String] - protected val maxMetadataValueLength = 100 + protected val maxMetadataValueLength = conf.maxMetadataValueLength Review comment: > Have you checked the related PR? https://github.com/apache/spark/pull/29688/files#diff-2a91a9a59953aa82fa132aaf45bd731bR58 Sorry, don't notice that pr. I have similar ideal like that pr too. I'm going to restore this change and deal with the two PR conflicts finally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
SparkQA commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692418147 **[Test build #128689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128689/testReport)** for PR 29750 at commit [`718590d`](https://github.com/apache/spark/commit/718590dc2beafcafd570ff13b9a09d06b7cc3326). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
maropu commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692417507 > As mentioned above, I use similar way to construct HiveTableRelation's simpleString. Please update the PR description about how-to-fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
maropu commented on a change in pull request #29739: URL: https://github.com/apache/spark/pull/29739#discussion_r488338816 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -55,7 +55,7 @@ trait DataSourceScanExec extends LeafExecNode { // Metadata that describes more details of this scan. protected def metadata: Map[String, String] - protected val maxMetadataValueLength = 100 + protected val maxMetadataValueLength = conf.maxMetadataValueLength Review comment: Have you checked the related PR? https://github.com/apache/spark/pull/29688/files#diff-2a91a9a59953aa82fa132aaf45bd731bR58 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
AmplabJenkins removed a comment on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692416281 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
AmplabJenkins commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692416281 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
Ngone51 commented on pull request #29750: URL: https://github.com/apache/spark/pull/29750#issuecomment-692416287 Thanks all! I've addressed the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins removed a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692411971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
HyukjinKwon commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488334221 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: I don't think the format is meant to be exposed from end users. It's just for internal ser/de purpose that is supposed to be roundtrip between `jsonValue` and `fromJson` to workaround Py4J limitation. BTW, here isn't only place where you should handle JSON ser/de. For example, you should also change Scala side at `DataType.parseDataType`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692411971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZh commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692411754 @cloud-fan As mentioned above, I use similar way to construct HiveTableRelation's simpleString. Compared to what we had before,I decrease the detail metadata of each partition and only retain the partSpec to show each partition was pruned. since for detail information, we always don't see this in Plan but to use `DESC EXTENDED` statement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
SparkQA commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692411732 **[Test build #128688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128688/testReport)** for PR 29515 at commit [`529aba8`](https://github.com/apache/spark/commit/529aba8e7207a22039aebbe379d9d27035a143c1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
maropu commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692410536 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29739: [SPARK-32867][SQL] When explain, HiveTableRelation show limited message
AngersZh commented on pull request #29739: URL: https://github.com/apache/spark/pull/29739#issuecomment-692410347 For native file source, such as FileSourceScanExec, it will collect metadata of needed information: ``` override lazy val metadata: Map[String, String] = { def seqToString(seq: Seq[Any]) = seq.mkString("[", ", ", "]") val location = relation.location val locationDesc = location.getClass.getSimpleName + Utils.buildLocationMetadata(location.rootPaths, maxMetadataValueLength) val metadata = Map( "Format" -> relation.fileFormat.toString, "ReadSchema" -> requiredSchema.catalogString, "Batched" -> supportsColumnar.toString, "PartitionFilters" -> seqToString(partitionFilters), "PushedFilters" -> seqToString(pushedDownFilters), "DataFilters" -> seqToString(dataFilters), "Location" -> locationDesc) val withSelectedBucketsCount = relation.bucketSpec.map { spec => val numSelectedBuckets = optionalBucketSet.map { b => b.cardinality() } getOrElse { spec.numBuckets } metadata + ("SelectedBucketsCount" -> (s"$numSelectedBuckets out of ${spec.numBuckets}" + optionalNumCoalescedBuckets.map { b => s" (Coalesced to $b)"}.getOrElse(""))) } getOrElse { metadata } withSelectedBucketsCount } ``` Then in his parent class `DataSourceScanExec`, it will construct `simpleString` with `metadata`. In `simpleString` ``` override def simpleString(maxFields: Int): String = { val metadataEntries = metadata.toSeq.sorted.map { case (key, value) => key + ": " + StringUtils.abbreviate(redact(value), maxMetadataValueLength) } val metadataStr = truncatedString(metadataEntries, " ", ", ", "", maxFields) redact( s"$nodeNamePrefix$nodeName${truncatedString(output, "[", ",", "]", maxFields)}$metadataStr") } ``` each info will be abbreviated to length of 100. So FileSourceScan's FileIndex message will be hided because of length of 100. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
maropu commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-692410205 > @cloud-fan, now that #29647 is merged, can this be merged also? Are all the bugs that this PR found already fixed now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
AmplabJenkins removed a comment on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692409521 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128674/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
AmplabJenkins removed a comment on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692409518 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
AmplabJenkins commented on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692409518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29655: [SPARK-32806][SQL] SortMergeJoin with partial hash distribution can be optimized to remove shuffle
maropu commented on pull request #29655: URL: https://github.com/apache/spark/pull/29655#issuecomment-692409151 > we added #19054 in our internal fork and don't see much OOM issues. Even so, I think removing shuffles in the middles of stages (e.g., many join cases) can make the prob. of OOM higher in theory in case of data skew. Since we can control input distributions somewhat, e.g., by the bucketing technique, it might be worth trying the restrictive approach that @imback82 suggested above, I think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
SparkQA removed a comment on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692373281 **[Test build #128674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128674/testReport)** for PR 29749 at commit [`e73ccbf`](https://github.com/apache/spark/commit/e73ccbf3be4b29714c3da1cd0ddefe2b51095b59). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29749: [SPARK-32877][SQL] Fix Hive UDF not support decimal type in complex type
SparkQA commented on pull request #29749: URL: https://github.com/apache/spark/pull/29749#issuecomment-692408690 **[Test build #128674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128674/testReport)** for PR 29749 at commit [`e73ccbf`](https://github.com/apache/spark/commit/e73ccbf3be4b29714c3da1cd0ddefe2b51095b59). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29750: [SPARK-32878][CORE] Avoid scheduling TaskSetManager which has no pending tasks
dongjoon-hyun commented on a change in pull request #29750: URL: https://github.com/apache/spark/pull/29750#discussion_r488330538 ## File path: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ## @@ -107,7 +109,7 @@ private[spark] class Pool( for (schedulable <- sortedSchedulableQueue) { sortedTaskSetQueue ++= schedulable.getSortedTaskSetQueue } -sortedTaskSetQueue +sortedTaskSetQueue.filter(_.isSchedulable) Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
jroof88 commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488327546 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: Correct we have a use case where we build up the JSON elsewhere and we don't want to have to require the default keys. It drives down complexity when defining schemas in external JSON files This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
jroof88 commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488327546 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: Correct we have a use case where we build up the JSON elsewhere and we don't want to have to require the default keys. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
dongjoon-hyun closed pull request #29558: URL: https://github.com/apache/spark/pull/29558 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
jroof88 commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488326118 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: Added! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns
AmplabJenkins removed a comment on pull request #29587: URL: https://github.com/apache/spark/pull/29587#issuecomment-692403734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
HyukjinKwon commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488326022 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: I mean I am trying to understand why you want this change. For example, does it affect anything in the roundtrip between `jsonValue` and `fromJson`, or are you trying to build up the JSON by yourself somewhere? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns
AmplabJenkins commented on pull request #29587: URL: https://github.com/apache/spark/pull/29587#issuecomment-692403734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns
SparkQA commented on pull request #29587: URL: https://github.com/apache/spark/pull/29587#issuecomment-692403386 **[Test build #128687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128687/testReport)** for PR 29587 at commit [`b4270f4`](https://github.com/apache/spark/commit/b4270f4f8879f3225399c93456beb30e2a2c78e9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jroof88 commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
jroof88 commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488324986 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: Right so the default value for `containsNull` for `ArrayType` is `True` so this test shows that without supplying it in the JSON or Constructor you get the same result. I will add another `assert` for the resulting JSON. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29558: [SPARK-32715][CORE] Fix memory leak when failed to store pieces of broadcast
dongjoon-hyun commented on pull request #29558: URL: https://github.com/apache/spark/pull/29558#issuecomment-692402763 Thank you for update, @LantaoJin . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29587: [SPARK-32376][SQL] Make unionByName null-filling behavior work with struct columns
viirya commented on a change in pull request #29587: URL: https://github.com/apache/spark/pull/29587#discussion_r488324577 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala ## @@ -507,33 +507,156 @@ class DataFrameSetOperationsSuite extends QueryTest with SharedSparkSession { } test("SPARK-29358: Make unionByName optionally fill missing columns with nulls") { -var df1 = Seq(1, 2, 3).toDF("a") -var df2 = Seq(3, 1, 2).toDF("b") -val df3 = Seq(2, 3, 1).toDF("c") -val unionDf = df1.unionByName(df2.unionByName(df3, true), true) -checkAnswer(unionDf, - Row(1, null, null) :: Row(2, null, null) :: Row(3, null, null) :: // df1 -Row(null, 3, null) :: Row(null, 1, null) :: Row(null, 2, null) :: // df2 -Row(null, null, 2) :: Row(null, null, 3) :: Row(null, null, 1) :: Nil // df3 -) +Seq("true", "false").foreach { config => + withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> config) { +var df1 = Seq(1, 2, 3).toDF("a") +var df2 = Seq(3, 1, 2).toDF("b") +val df3 = Seq(2, 3, 1).toDF("c") +val unionDf = df1.unionByName(df2.unionByName(df3, true), true) +checkAnswer(unionDf, + Row(1, null, null) :: Row(2, null, null) :: Row(3, null, null) :: // df1 +Row(null, 3, null) :: Row(null, 1, null) :: Row(null, 2, null) :: // df2 +Row(null, null, 2) :: Row(null, null, 3) :: Row(null, null, 1) :: Nil // df3 +) + +df1 = Seq((1, 2)).toDF("a", "c") +df2 = Seq((3, 4, 5)).toDF("a", "b", "c") +checkAnswer(df1.unionByName(df2, true), + Row(1, 2, null) :: Row(3, 5, 4) :: Nil) +checkAnswer(df2.unionByName(df1, true), + Row(3, 4, 5) :: Row(1, null, 2) :: Nil) + +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + df2 = Seq((3, 4, 5)).toDF("a", "B", "C") + val union1 = df1.unionByName(df2, true) + val union2 = df2.unionByName(df1, true) + + checkAnswer(union1, Row(1, 2, null, null) :: Row(3, null, 4, 5) :: Nil) + checkAnswer(union2, Row(3, 4, 5, null) :: Row(1, null, null, 2) :: Nil) + + assert(union1.schema.fieldNames === Array("a", "c", "B", "C")) + assert(union2.schema.fieldNames === Array("a", "B", "C", "c")) +} + } +} + } -df1 = Seq((1, 2)).toDF("a", "c") -df2 = Seq((3, 4, 5)).toDF("a", "b", "c") -checkAnswer(df1.unionByName(df2, true), - Row(1, 2, null) :: Row(3, 5, 4) :: Nil) -checkAnswer(df2.unionByName(df1, true), - Row(3, 4, 5) :: Row(1, null, 2) :: Nil) + test("SPARK-32376: Make unionByName null-filling behavior work with struct columns - simple") { +withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> "true") { + val df1 = Seq(((1, 2, 3), 0), ((2, 3, 4), 1), ((3, 4, 5), 2)).toDF("a", "idx") + val df2 = Seq(((3, 4), 0), ((1, 2), 1), ((2, 3), 2)).toDF("a", "idx") + val df3 = Seq(((100, 101, 102, 103), 0), ((110, 111, 112, 113), 1), ((120, 121, 122, 123), 2)) +.toDF("a", "idx") + + var unionDf = df1.unionByName(df2, true) + + checkAnswer(unionDf, +Row(Row(1, 2, 3), 0) :: Row(Row(2, 3, 4), 1) :: Row(Row(3, 4, 5), 2) :: + Row(Row(3, 4, null), 0) :: Row(Row(1, 2, null), 1) :: Row(Row(2, 3, null), 2) :: Nil + ) + + assert(unionDf.schema.toDDL == "`a` STRUCT<`_1`: INT, `_2`: INT, `_3`: INT>,`idx` INT") + + unionDf = df1.unionByName(df2, true).unionByName(df3, true) + + checkAnswer(unionDf, +Row(Row(1, 2, 3, null), 0) :: + Row(Row(2, 3, 4, null), 1) :: + Row(Row(3, 4, 5, null), 2) :: // df1 + Row(Row(3, 4, null, null), 0) :: + Row(Row(1, 2, null, null), 1) :: + Row(Row(2, 3, null, null), 2) :: // df2 + Row(Row(100, 101, 102, 103), 0) :: + Row(Row(110, 111, 112, 113), 1) :: + Row(Row(120, 121, 122, 123), 2) :: Nil // df3 + ) + assert(unionDf.schema.toDDL == +"`a` STRUCT<`_1`: INT, `_2`: INT, `_3`: INT, `_4`: INT>,`idx` INT") +} + } -withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { - df2 = Seq((3, 4, 5)).toDF("a", "B", "C") - val union1 = df1.unionByName(df2, true) - val union2 = df2.unionByName(df1, true) + test("SPARK-32376: Make unionByName null-filling behavior work with struct columns - nested") { +withSQLConf(SQLConf.UNION_BYNAME_STRUCT_SUPPORT_ENABLED.key -> "true") { + val df1 = Seq((0, UnionClass1a(0, 1L, UnionClass2(1, "2".toDF("id", "a") + val df2 = Seq((1, UnionClass1b(1, 2L, UnionClass3(2, 3L.toDF("id", "a") + + val expectedSchema = "`id` INT,`a` STRUCT<`a`: INT, `b`: BIGINT, " + +"`nested`: STRUCT<`a`: INT, `b`: BIGINT, `c`: STRING>>" + + var unionDf = df1.unionByName(df2, true) + checkAnswer(unionDf, +Row(0, Row(0, 1, Row(1, null, "2"))) :: + Row(1, Row(1, 2,
[GitHub] [spark] AmplabJenkins commented on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold
AmplabJenkins commented on pull request #29753: URL: https://github.com/apache/spark/pull/29753#issuecomment-692401227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold
AmplabJenkins removed a comment on pull request #29753: URL: https://github.com/apache/spark/pull/29753#issuecomment-692401227 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold
dongjoon-hyun commented on pull request #29753: URL: https://github.com/apache/spark/pull/29753#issuecomment-692401041 Merged to branch-2.4. Thanks, @ankurdave ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold
SparkQA removed a comment on pull request #29753: URL: https://github.com/apache/spark/pull/29753#issuecomment-692338389 **[Test build #128672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128672/testReport)** for PR 29753 at commit [`56b7bca`](https://github.com/apache/spark/commit/56b7bca38d9952484ef1030a2a2058d97169c223). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29753: [SPARK-32872][CORE][2.4] Prevent BytesToBytesMap at MAX_CAPACITY from exceeding growth threshold
dongjoon-hyun closed pull request #29753: URL: https://github.com/apache/spark/pull/29753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29720: [SPARK-32849][PYSPARK] Add default values for non-required keys when creating StructType
HyukjinKwon commented on a change in pull request #29720: URL: https://github.com/apache/spark/pull/29720#discussion_r488323152 ## File path: python/pyspark/sql/types.py ## @@ -305,7 +305,7 @@ def jsonValue(self): @classmethod def fromJson(cls, json): Review comment: It has: ```python >>> ArrayType(StringType()).jsonValue() {'type': 'array', 'elementType': 'string', 'containsNull': True} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org