[GitHub] [spark] AmplabJenkins removed a comment on pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins removed a comment on pull request #29260: URL: https://github.com/apache/spark/pull/29260#issuecomment-664347826 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins commented on pull request #29260: URL: https://github.com/apache/spark/pull/29260#issuecomment-664347826 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WinkerDu opened a new pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
WinkerDu opened a new pull request #29260: URL: https://github.com/apache/spark/pull/29260 ### What changes were proposed in this pull request? When using dynamic partition overwrite, each task has its working dir under staging dir like `stagingDir/.spark-staging-{jobId}`, each task commits to `stagingDir/.spark-staging-{jobId}/{partitionId}/part-{taskId}-{jobId}{ext}`. When speculation enable, multiple task attempts would be setup for one task, **they have same task id and they would commit to same file concurrently**. Due to host done or node preemption, the partly-committed files aren't cleaned up, a FileAlreadyExistsException would be raised in this situation, resulting in job failure. I don't try to change task commit process for dynamic partition overwrite, like adding attempt id to task working dir for each attempts and committing to final output dir via a new outputCommitCoordinator, here is reason: 1. `FileOutputCommitter` already has commit coordinator for each task attempts, we can leverage it rather than build a new one. 2. To say the least, we implement a coordinator solving task attempts commit conflict, suppose a severe case, application master failover, tasks with same attempt id and same task id would commit to same files, the `FileAlreadyExistsException` risk still exists In this pr, I leverage FileOutputCommitter to solve the problem: 1. when initing a write job description, set `stagingDir/.spark-staging-{jobId}` as the output dir 2. each task attempt writes output to `stagingDir/.spark-staging-{jobId}/_temporary/${appAttemptId}/_temporary/${taskAttemptId}/{partitionId}/part-{taskId}-{jobId}{ext}` 3. leverage `FileOutputCommitter` coordinator, write job firstly commits output to `stagingDir/.spark-staging-{jobId}/{partitionId}` 4. for dynamic partition overwrite, write job finally move `stagingDir/.spark-staging-{jobId}/{partitionId}` to `finalPath/{partitionId}` ### Why are the changes needed? Without this pr, dynamic partition overwrite would fail ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? added UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP
AmplabJenkins removed a comment on pull request #29243: URL: https://github.com/apache/spark/pull/29243#issuecomment-664341924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP
AmplabJenkins commented on pull request #29243: URL: https://github.com/apache/spark/pull/29243#issuecomment-664341924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-664341339 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126634/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-664341327 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
AmplabJenkins commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-664341327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP
SparkQA commented on pull request #29243: URL: https://github.com/apache/spark/pull/29243#issuecomment-664341321 **[Test build #126641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126641/testReport)** for PR 29243 at commit [`bcc81be`](https://github.com/apache/spark/commit/bcc81be47ea0d8f04a3d508162d883bc57ecd68e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA removed a comment on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-664212994 **[Test build #126634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126634/testReport)** for PR 29014 at commit [`c5edd23`](https://github.com/apache/spark/commit/c5edd2322e0da6941de6439c0e9321b188cfc2fc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
SparkQA commented on pull request #29014: URL: https://github.com/apache/spark/pull/29014#issuecomment-664340538 **[Test build #126634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126634/testReport)** for PR 29014 at commit [`c5edd23`](https://github.com/apache/spark/commit/c5edd2322e0da6941de6439c0e9321b188cfc2fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExecutorProcessLost(` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins removed a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664338595 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins removed a comment on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-664338651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins commented on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664338595 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
AmplabJenkins commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-664338651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
SparkQA commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-664338026 **[Test build #126640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126640/testReport)** for PR 28968 at commit [`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
SparkQA commented on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664337984 **[Test build #126639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126639/testReport)** for PR 29000 at commit [`e3dc26b`](https://github.com/apache/spark/commit/e3dc26ba0bf71371155fcaa227ac512ab987be98). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode
HyukjinKwon commented on pull request #28968: URL: https://github.com/apache/spark/pull/28968#issuecomment-664337380 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664334637 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126626/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664334632 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664175993 **[Test build #126626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126626/testReport)** for PR 28841 at commit [`11e1109`](https://github.com/apache/spark/commit/11e1109350fe5be67ff549fe94de1efd77735356). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664334632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664334191 **[Test build #126626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126626/testReport)** for PR 28841 at commit [`11e1109`](https://github.com/apache/spark/commit/11e1109350fe5be67ff549fe94de1efd77735356). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
AmplabJenkins removed a comment on pull request #29188: URL: https://github.com/apache/spark/pull/29188#issuecomment-664332877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
HyukjinKwon closed pull request #29229: URL: https://github.com/apache/spark/pull/29229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
AmplabJenkins commented on pull request #29188: URL: https://github.com/apache/spark/pull/29188#issuecomment-664332877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
HyukjinKwon commented on pull request #29229: URL: https://github.com/apache/spark/pull/29229#issuecomment-664332698 Merged to master. Thanks @dongjoon-hyun and @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
SparkQA removed a comment on pull request #29188: URL: https://github.com/apache/spark/pull/29188#issuecomment-664179480 **[Test build #126627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126627/testReport)** for PR 29188 at commit [`d6d0117`](https://github.com/apache/spark/commit/d6d011737aba77e7eca41c1b65d6588764856797). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
SparkQA commented on pull request #29188: URL: https://github.com/apache/spark/pull/29188#issuecomment-664331962 **[Test build #126627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126627/testReport)** for PR 29188 at commit [`d6d0117`](https://github.com/apache/spark/commit/d6d011737aba77e7eca41c1b65d6588764856797). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
AmplabJenkins removed a comment on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664329566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
SparkQA removed a comment on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664247382 **[Test build #126638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)** for PR 29257 at commit [`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
AmplabJenkins commented on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664329566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
SparkQA commented on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664329191 **[Test build #126638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)** for PR 29257 at commit [`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite
AmplabJenkins commented on pull request #29259: URL: https://github.com/apache/spark/pull/29259#issuecomment-664314873 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite
AmplabJenkins removed a comment on pull request #29259: URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite
AmplabJenkins commented on pull request #29259: URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460805760 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { Review comment: kind of need broadcastThreshold config or the following case will planed into SMJ ``` // negative hand-written left anti join // testData.key nullable false // testData2.a nullable false joinExec = assertJoin(( "select * from testData left anti join testData2 ON key = a or isnull(key = a)", classOf[BroadcastHashJoinExec])) assert(!joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mundaym opened a new pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite
mundaym opened a new pull request #29259: URL: https://github.com/apache/spark/pull/29259 ### What changes were proposed in this pull request? PR #26548 means that RecordBinaryComparator now uses big endian byte order for long comparisons. However, this means that some of the constants in the regression tests no longer map to the same values in the comparison that they used to. For example, one of the tests does a comparison between Long.MIN_VALUE and 1 in order to trigger an overflow condition that existed in the past (i.e. Long.MIN_VALUE - 1). These constants correspond to the values 0x80..00 and 0x00..01. However on a little-endian machine the bytes in these values are now swapped before they are compared. This means that we will now be comparing 0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow therefore missing the original purpose of the test. To fix this the constants are now explicitly written out in big endian byte order to match the byte order used in the comparison. This also fixes the tests on big endian machines (which would otherwise get a different comparison result to the little-endian machines). ### Why are the changes needed? The regression tests no longer serve their initial purposes and also fail on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests run on big-endian system (s390x). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization
AmplabJenkins removed a comment on pull request #29255: URL: https://github.com/apache/spark/pull/29255#issuecomment-664307630 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization
AmplabJenkins commented on pull request #29255: URL: https://github.com/apache/spark/pull/29255#issuecomment-664307630 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization
SparkQA removed a comment on pull request #29255: URL: https://github.com/apache/spark/pull/29255#issuecomment-664217047 **[Test build #126635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126635/testReport)** for PR 29255 at commit [`5860f81`](https://github.com/apache/spark/commit/5860f81c701888db06de76315ea10c01209507f8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization
SparkQA commented on pull request #29255: URL: https://github.com/apache/spark/pull/29255#issuecomment-664306088 **[Test build #126635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126635/testReport)** for PR 29255 at commit [`5860f81`](https://github.com/apache/spark/commit/5860f81c701888db06de76315ea10c01209507f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.
AmplabJenkins removed a comment on pull request #29258: URL: https://github.com/apache/spark/pull/29258#issuecomment-664302394 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.
AmplabJenkins commented on pull request #29258: URL: https://github.com/apache/spark/pull/29258#issuecomment-664304397 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.
AmplabJenkins commented on pull request #29258: URL: https://github.com/apache/spark/pull/29258#issuecomment-664302394 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mundaym opened a new pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.
mundaym opened a new pull request #29258: URL: https://github.com/apache/spark/pull/29258 ### What changes were proposed in this pull request? Updates to tests to use correctly sized `getInt` or `getLong` calls. ### Why are the changes needed? The reads were incorrectly sized (i.e. `putLong` paired with `getInt` and `putInt` paired with `getLong`). This causes test failures on big-endian systems. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Tests were run on a big-endian system (s390x). This change is unlikely to have any practical effect on little-endian systems. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
AmplabJenkins removed a comment on pull request #29229: URL: https://github.com/apache/spark/pull/29229#issuecomment-664290163 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
SparkQA removed a comment on pull request #29229: URL: https://github.com/apache/spark/pull/29229#issuecomment-664175950 **[Test build #126625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126625/testReport)** for PR 29229 at commit [`d6ac35d`](https://github.com/apache/spark/commit/d6ac35dde690a60388c596b7ae52692b15b6ff76). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
AmplabJenkins commented on pull request #29229: URL: https://github.com/apache/spark/pull/29229#issuecomment-664290163 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3
SparkQA commented on pull request #29229: URL: https://github.com/apache/spark/pull/29229#issuecomment-664286909 **[Test build #126625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126625/testReport)** for PR 29229 at commit [`d6ac35d`](https://github.com/apache/spark/commit/d6ac35dde690a60388c596b7ae52692b15b6ff76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460795351 ## File path: sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.sql.internal.SQLConf + +/** + * End-to-end test cases for subquery SQL queries coverage with + * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true. + * + * Each case is loaded from a file in + * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery". + * Each case has a golden result file in + * "spark/sql/core/src/test/resources/sql-tests/results/subquery". + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite" + * }}} + * + */ +class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite { Review comment: I will put config dimension in files in sql-tests if there are SQL include "NOT IN" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins removed a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664275587 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126629/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins removed a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664275569 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
AmplabJenkins commented on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664275569 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
SparkQA removed a comment on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664186752 **[Test build #126629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126629/testReport)** for PR 29000 at commit [`5865f51`](https://github.com/apache/spark/commit/5865f51181b235647afafa22c734db9987668f0b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode
SparkQA commented on pull request #29000: URL: https://github.com/apache/spark/pull/29000#issuecomment-664273902 **[Test build #126629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126629/testReport)** for PR 29000 at commit [`5865f51`](https://github.com/apache/spark/commit/5865f51181b235647afafa22c734db9987668f0b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r46016 ## File path: sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.sql.internal.SQLConf + +/** + * End-to-end test cases for subquery SQL queries coverage with + * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true. + * + * Each case is loaded from a file in + * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery". + * Each case has a golden result file in + * "spark/sql/core/src/test/resources/sql-tests/results/subquery". + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite" + * }}} + * + */ +class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite { Review comment: since spark.sql.nullAwareAntiJoin.optimize.enabled is default true for now, is it ok to just delete this case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460776658 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ## @@ -1646,4 +1647,96 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark checkAnswer(df, df2) checkAnswer(df, Nil) } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +Seq((true, true, true), (true, true, false), (true, false, true), Review comment: Seq(true, false).foreach { enableNAAJ => Seq(true, false).foreach { enableAQE => ... } } This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460776658 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ## @@ -1646,4 +1647,96 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark checkAnswer(df, df2) checkAnswer(df, Nil) } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +Seq((true, true, true), (true, true, false), (true, false, true), Review comment: ``` Seq(true, false).foreach { enableNAAJ => Seq(true, false).foreach { enableAQE => ... } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460776654 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala ## @@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with DisableAdaptiveExecutionSui val output = captured.toString() assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains( - """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#x] == -|Tuples output: 0 -| id LongType: {} -|== WholeStageCodegen (1) == -|Tuples output: 10 -| id LongType: {java.lang.Long} -|== Range (0, 10, step=1, splits=2) == -|Tuples output: 0 -| id LongType: {}""".stripMargin)) +"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#x] == Review comment: if I keep origin indentation, it will exceed 100 column in lines. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460776317 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala ## @@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with DisableAdaptiveExecutionSui val output = captured.toString() assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains( - """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#x] == -|Tuples output: 0 -| id LongType: {} -|== WholeStageCodegen (1) == -|Tuples output: 10 -| id LongType: {java.lang.Long} -|== Range (0, 10, step=1, splits=2) == -|Tuples output: 0 -| id LongType: {}""".stripMargin)) +"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#x] == Review comment: how to skip a "\n" in triple line string? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460775483 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala ## @@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with DisableAdaptiveExecutionSui val output = captured.toString() assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains( - """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false])), [id=#x] == -|Tuples output: 0 -| id LongType: {} -|== WholeStageCodegen (1) == -|Tuples output: 10 -| id LongType: {java.lang.Long} -|== Range (0, 10, step=1, splits=2) == -|Tuples output: 0 -| id LongType: {}""".stripMargin)) +"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#x] == Review comment: can you keep the indentation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460775092 ## File path: sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.sql.internal.SQLConf + +/** + * End-to-end test cases for subquery SQL queries coverage with + * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true. + * + * Each case is loaded from a file in + * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery". + * Each case has a golden result file in + * "spark/sql/core/src/test/resources/sql-tests/results/subquery". + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite" + * }}} + * + */ +class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite { Review comment: Actually, very few queries can benefit from NAAJ. Can you find them out and test them via config dimensions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
AmplabJenkins removed a comment on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664244178 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460774691 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { + // positive not in subquery case + assertJoin(( +"select * from testData where key not in (select a from testData2)", +classOf[BroadcastHashJoinExec])) + + // negative not in subquery case since multi-column is not supported + assertJoin(( +"select * from testData where (key, key + 1) not in (select * from testData2)", +classOf[BroadcastNestedLoopJoinExec])) + + // positive hand-written left anti join + // testData.key nullable false + // testData3.b nullable true + assertJoin(( +"select * from testData left anti join testData3 ON key = b or isnull(key = b)", +classOf[BroadcastHashJoinExec])) + + // negative hand-written left anti join Review comment: both key has nullable = false, will remove the IsNull condition, and change it into a normal anti join, let me add some more assert on the isNullAware prop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460774691 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { + // positive not in subquery case + assertJoin(( +"select * from testData where key not in (select a from testData2)", +classOf[BroadcastHashJoinExec])) + + // negative not in subquery case since multi-column is not supported + assertJoin(( +"select * from testData where (key, key + 1) not in (select * from testData2)", +classOf[BroadcastNestedLoopJoinExec])) + + // positive hand-written left anti join + // testData.key nullable false + // testData3.b nullable true + assertJoin(( +"select * from testData left anti join testData3 ON key = b or isnull(key = b)", +classOf[BroadcastHashJoinExec])) + + // negative hand-written left anti join Review comment: both key has nullable = false, will remove the IsNull condition, and change it into a normal anti join, let me add some more assert on the isNullAware prop. hens it's no a NAAJ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
SparkQA commented on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664247382 **[Test build #126638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)** for PR 29257 at commit [`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460774615 ## File path: sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File + +import org.apache.spark.SparkConf +import org.apache.spark.sql.internal.SQLConf + +/** + * End-to-end test cases for subquery SQL queries coverage with + * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true. + * + * Each case is loaded from a file in + * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery". + * Each case has a golden result file in + * "spark/sql/core/src/test/resources/sql-tests/results/subquery". + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite" + * }}} + * + */ +class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite { Review comment: It's super over-kill to rerun the entire `SQLQueryTestSuite` with NAAJ enabled. Let's find out the join test files, and use config dimension to test NAAJ. e.g. `join.sql` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460773562 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { + // positive not in subquery case + assertJoin(( +"select * from testData where key not in (select a from testData2)", +classOf[BroadcastHashJoinExec])) + + // negative not in subquery case since multi-column is not supported + assertJoin(( +"select * from testData where (key, key + 1) not in (select * from testData2)", +classOf[BroadcastNestedLoopJoinExec])) + + // positive hand-written left anti join + // testData.key nullable false + // testData3.b nullable true + assertJoin(( +"select * from testData left anti join testData3 ON key = b or isnull(key = b)", +classOf[BroadcastHashJoinExec])) + + // negative hand-written left anti join Review comment: is it negative? We do produce `BroadcastHashJoinExec` here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460773428 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { Review comment: will remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460772792 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -903,15 +926,65 @@ private[joins] object LongHashedRelation { if (!rowKey.isNullAt(0)) { val key = rowKey.getLong(0) map.append(key, unsafeRow) + } else if (isNullAware) { +return new EmptyHashedRelationWithAllNullKeys } } map.optimize() new LongHashedRelation(numFields, map) } } +/** + * A special HashedRelation indicates it built from a empty input:Iterator[InternalRow]. + */ +class EmptyHashedRelation extends HashedRelation with Externalizable { Review comment: Ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460772732 ## File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ## @@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan }) } } + + test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") { +withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true", + SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) { Review comment: do we need to set this config? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
HyukjinKwon commented on a change in pull request #29241: URL: https://github.com/apache/spark/pull/29241#discussion_r460682607 ## File path: core/src/main/scala/org/apache/spark/TestUtils.scala ## @@ -236,7 +236,11 @@ private[spark] object TestUtils { * Test if a command is available. */ def testCommandAvailable(command: String): Boolean = { -val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue()) +val attempt = if (Utils.isWindows) { + Try(Process(command).run(ProcessLogger(_ => ())).exitValue()) Review comment: ```suggestion Try(Process(s"WHERE $command").run(ProcessLogger(_ => ())).exitValue()) ``` ## File path: core/src/main/scala/org/apache/spark/TestUtils.scala ## @@ -236,7 +236,11 @@ private[spark] object TestUtils { * Test if a command is available. */ def testCommandAvailable(command: String): Boolean = { -val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue()) +val attempt = if (Utils.isWindows) { + Try(Process(command).run(ProcessLogger(_ => ())).exitValue()) Review comment: @dongjoon-hyun, I manually tested. `command` -> `$"WHERE $command"` seems working fine in Windows with Scala 2.13. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460772228 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -311,6 +314,15 @@ private[joins] object UnsafeHashedRelation { key: Seq[Expression], sizeEstimate: Int, taskMemoryManager: TaskMemoryManager): HashedRelation = { +apply(input, key, sizeEstimate, taskMemoryManager, isNullAware = false) + } + + def apply( + input: Iterator[InternalRow], + key: Seq[Expression], + sizeEstimate: Int, + taskMemoryManager: TaskMemoryManager, + isNullAware: Boolean = false): HashedRelation = { Review comment: let me remove them and see if there are build issues. if not , I will remove them for good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations
AmplabJenkins removed a comment on pull request #29204: URL: https://github.com/apache/spark/pull/29204#issuecomment-664244632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460771904 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -903,15 +926,65 @@ private[joins] object LongHashedRelation { if (!rowKey.isNullAt(0)) { val key = rowKey.getLong(0) map.append(key, unsafeRow) + } else if (isNullAware) { +return new EmptyHashedRelationWithAllNullKeys } } map.optimize() new LongHashedRelation(numFields, map) } } +/** + * A special HashedRelation indicates it built from a empty input:Iterator[InternalRow]. + */ +class EmptyHashedRelation extends HashedRelation with Externalizable { Review comment: shall we have a common trait for these 2 to contain the fake implementation? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations
AmplabJenkins commented on pull request #29204: URL: https://github.com/apache/spark/pull/29204#issuecomment-664244632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations
SparkQA removed a comment on pull request #29204: URL: https://github.com/apache/spark/pull/29204#issuecomment-664212962 **[Test build #126633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126633/testReport)** for PR 29204 at commit [`5011314`](https://github.com/apache/spark/commit/50113145de4d9c7247d2a8af6e1e4f1087d19548). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations
SparkQA commented on pull request #29204: URL: https://github.com/apache/spark/pull/29204#issuecomment-664244271 **[Test build #126633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126633/testReport)** for PR 29204 at commit [`5011314`](https://github.com/apache/spark/commit/50113145de4d9c7247d2a8af6e1e4f1087d19548). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
HyukjinKwon commented on a change in pull request #29241: URL: https://github.com/apache/spark/pull/29241#discussion_r460689868 ## File path: core/src/main/scala/org/apache/spark/TestUtils.scala ## @@ -236,7 +236,11 @@ private[spark] object TestUtils { * Test if a command is available. */ def testCommandAvailable(command: String): Boolean = { -val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue()) +val attempt = if (Utils.isWindows) { + Try(Process(s"WHERE $command").run(ProcessLogger(_ => ())).exitValue()) Review comment: I run some tests via AppVeyor manually to show it works in Windows: Build started: [CORE] `org.apache.spark.rdd.PipedRDDSuite` [![PR-29241](https://ci.appveyor.com/api/projects/status/github/HyukjinKwon/spark?branch=5D8964EE-6046-4DBA-91AD-53B2E5EE4500=true)](https://ci.appveyor.com/project/HyukjinKwon/spark/branch/5D8964EE-6046-4DBA-91AD-53B2E5EE4500) Build started: [SQL] `org.apache.spark.sql.hive.execution.SQLQuerySuite` [![PR-29241](https://ci.appveyor.com/api/projects/status/github/HyukjinKwon/spark?branch=C22F6129-6D78-40AF-BE21-1A625BC656A1=true)](https://ci.appveyor.com/project/HyukjinKwon/spark/branch/C22F6129-6D78-40AF-BE21-1A625BC656A1) Some of tests might already fail. We can just see if `testCommandAvailable` works or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
AmplabJenkins commented on pull request #29257: URL: https://github.com/apache/spark/pull/29257#issuecomment-664244178 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460768470 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -454,6 +490,43 @@ case class BroadcastHashJoinExec( val (matched, checkCondition, _) = getJoinCondition(ctx, input) val numOutput = metricTerm(ctx, "numOutputRows") +if (isNullAwareAntiJoin) { + if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) { +return s""" + |// NAAJ Join EmptyHashedRelation accept all + |$numOutput.add(1); + |${consume(ctx, input)} +""".stripMargin + } else if (broadcastRelation.value.isInstanceOf[EmptyHashedRelationWithAllNullKeys]) { +return s""" + |// NAAJ + |// EmptyHashedRelationWithAllNullKeys + |// reject all Review comment: `// If the right side contains a key that all columns are null, NAAJ simply returns Nil.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460770609 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -311,6 +314,15 @@ private[joins] object UnsafeHashedRelation { key: Seq[Expression], sizeEstimate: Int, taskMemoryManager: TaskMemoryManager): HashedRelation = { +apply(input, key, sizeEstimate, taskMemoryManager, isNullAware = false) + } + + def apply( + input: Iterator[InternalRow], + key: Seq[Expression], + sizeEstimate: Int, + taskMemoryManager: TaskMemoryManager, + isNullAware: Boolean = false): HashedRelation = { Review comment: do we really need 2 `apply` methods as we have default parameter value here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460770684 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -889,7 +903,16 @@ private[joins] object LongHashedRelation { input: Iterator[InternalRow], key: Seq[Expression], sizeEstimate: Int, - taskMemoryManager: TaskMemoryManager): LongHashedRelation = { + taskMemoryManager: TaskMemoryManager): HashedRelation = { +apply(input, key, sizeEstimate, taskMemoryManager, false) + } + + def apply( + input: Iterator[InternalRow], + key: Seq[Expression], + sizeEstimate: Int, + taskMemoryManager: TaskMemoryManager, + isNullAware: Boolean = false): HashedRelation = { Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
zhengruifeng opened a new pull request #29257: URL: https://github.com/apache/spark/pull/29257 ### What changes were proposed in this pull request? logParam `thresholds` in DT/GBT/FM/LR/MLP ### Why are the changes needed? param `thresholds` is logged in NB/RF, but not in other ProbabilisticClassifier ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing testsuites This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460768470 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -454,6 +490,43 @@ case class BroadcastHashJoinExec( val (matched, checkCondition, _) = getJoinCondition(ctx, input) val numOutput = metricTerm(ctx, "numOutputRows") +if (isNullAwareAntiJoin) { + if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) { +return s""" + |// NAAJ Join EmptyHashedRelation accept all + |$numOutput.add(1); + |${consume(ctx, input)} +""".stripMargin + } else if (broadcastRelation.value.isInstanceOf[EmptyHashedRelationWithAllNullKeys]) { +return s""" + |// NAAJ + |// EmptyHashedRelationWithAllNullKeys + |// reject all Review comment: `// If the right side contains any all-null key, NAAJ simply returns Nil.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
cloud-fan commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460767934 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -454,6 +490,43 @@ case class BroadcastHashJoinExec( val (matched, checkCondition, _) = getJoinCondition(ctx, input) val numOutput = metricTerm(ctx, "numOutputRows") +if (isNullAwareAntiJoin) { + if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) { +return s""" + |// NAAJ Join EmptyHashedRelation accept all Review comment: `NAAJ` already means Join. How about `If the right side is empty, NAAJ simply returns the left side.` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark
AmplabJenkins removed a comment on pull request #29256: URL: https://github.com/apache/spark/pull/29256#issuecomment-664240371 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
AmplabJenkins removed a comment on pull request #29241: URL: https://github.com/apache/spark/pull/29241#issuecomment-664239449 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126623/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark
AmplabJenkins commented on pull request #29256: URL: https://github.com/apache/spark/pull/29256#issuecomment-664240371 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark
SparkQA commented on pull request #29256: URL: https://github.com/apache/spark/pull/29256#issuecomment-664239733 **[Test build #126637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126637/testReport)** for PR 29256 at commit [`66e1f52`](https://github.com/apache/spark/commit/66e1f52197f0157b91017327155b498b9ce05688). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
AmplabJenkins removed a comment on pull request #29241: URL: https://github.com/apache/spark/pull/29241#issuecomment-664239439 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
AmplabJenkins commented on pull request #29241: URL: https://github.com/apache/spark/pull/29241#issuecomment-664239439 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking opened a new pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark
xuanyuanking opened a new pull request #29256: URL: https://github.com/apache/spark/pull/29256 ### What changes were proposed in this pull request? Check the Distinct nodes by assuming it as Aggregate in `UnsupportOperationChecker` for streaming. ### Why are the changes needed? Since the union clause in SQL has the requirement of deduplication, the parser will generate `Distinct(Union)` and the optimizer rule `ReplaceDistinctWithAggregate` will change it to `Aggregate(Union)`. This logic is of both batch and streaming queries. However, in the streaming, the aggregation will be wrapped by state store operations. Before this change, the SS union queries in Append mode will get the following confusing error when the watermark is lacking. ``` java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:529) at scala.None$.get(Option.scala:527) at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:346) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:561) at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:112) ... ``` ### Does this PR introduce _any_ user-facing change? Yes, return a better error message. ### How was this patch tested? New UT added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
SparkQA commented on pull request #29241: URL: https://github.com/apache/spark/pull/29241#issuecomment-664238519 **[Test build #126623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126623/testReport)** for PR 29241 at commit [`d239ede`](https://github.com/apache/spark/commit/d239eded8f854a1d0fc30bac12f1f5c080af02eb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable
SparkQA removed a comment on pull request #29241: URL: https://github.com/apache/spark/pull/29241#issuecomment-664166354 **[Test build #126623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126623/testReport)** for PR 29241 at commit [`d239ede`](https://github.com/apache/spark/commit/d239eded8f854a1d0fc30bac12f1f5c080af02eb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-664237007 @cloud-fan updated. with your suggestion, hashedRelation code diff is smaller and making more sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins removed a comment on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-664236317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
AmplabJenkins commented on pull request #29104: URL: https://github.com/apache/spark/pull/29104#issuecomment-664236317 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken commented on a change in pull request #29104: URL: https://github.com/apache/spark/pull/29104#discussion_r460762568 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -896,22 +967,29 @@ private[joins] object LongHashedRelation { // Create a mapping of key -> rows var numFields = 0 +val isOriginalInputEmpty: Boolean = !input.hasNext +var allNullColumnKeyExistsInOriginalInput: Boolean = false while (input.hasNext) { val unsafeRow = input.next().asInstanceOf[UnsafeRow] numFields = unsafeRow.numFields() val rowKey = keyGenerator(unsafeRow) if (!rowKey.isNullAt(0)) { val key = rowKey.getLong(0) map.append(key, unsafeRow) + } else if (!allNullColumnKeyExistsInOriginalInput) { +allNullColumnKeyExistsInOriginalInput = true } } map.optimize() new LongHashedRelation(numFields, map) + .setOriginalInputEmtpy(isOriginalInputEmpty) + .setAllNullColumnKeyExistsInOriginalInput(allNullColumnKeyExistsInOriginalInput) + .asInstanceOf[LongHashedRelation] } } /** The HashedRelationBroadcastMode requires that rows are broadcasted as a HashedRelation. */ -case class HashedRelationBroadcastMode(key: Seq[Expression]) +case class HashedRelationBroadcastMode(key: Seq[Expression], isNullAware: Boolean = false) Review comment: done ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -323,11 +373,19 @@ private[joins] object UnsafeHashedRelation { // Create a mapping of buildKeys -> rows val keyGenerator = UnsafeProjection.create(key) var numFields = 0 +val isOriginalInputEmpty = !input.hasNext +var allNullColumnKeyExistsInOriginalInput: Boolean = false while (input.hasNext) { Review comment: done ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ## @@ -896,22 +967,29 @@ private[joins] object LongHashedRelation { // Create a mapping of key -> rows var numFields = 0 +val isOriginalInputEmpty: Boolean = !input.hasNext +var allNullColumnKeyExistsInOriginalInput: Boolean = false while (input.hasNext) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org