date:20200727

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29260:
URL: https://github.com/apache/spark/pull/29260#issuecomment-664347826


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29260:
URL: https://github.com/apache/spark/pull/29260#issuecomment-664347826







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WinkerDu opened a new pull request #29260: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox

WinkerDu opened a new pull request #29260:
URL: https://github.com/apache/spark/pull/29260

### What changes were proposed in this pull request?

When using dynamic partition overwrite, each task has its working dir under
staging dir like `stagingDir/.spark-staging-{jobId}`, each task commits to
`stagingDir/.spark-staging-{jobId}/{partitionId}/part-{taskId}-{jobId}{ext}`.
When speculation enable, multiple task attempts would be setup for one task,
**they have same task id and they would commit to same file concurrently**. Due
to host done or node preemption, the partly-committed files aren't cleaned up,
a FileAlreadyExistsException would be raised in this situation, resulting in
job failure.

I don't try to change task commit process for dynamic partition overwrite,
like adding attempt id to task working dir for each attempts and committing to
final output dir via a new outputCommitCoordinator, here is reason:

1. `FileOutputCommitter` already has commit coordinator for each task
attempts, we can leverage it rather than build a new one.
2. To say the least, we implement a coordinator solving task attempts commit
conflict, suppose a severe case, application master failover, tasks with same
attempt id and same task id would commit to same files, the
`FileAlreadyExistsException` risk still exists

In this pr, I leverage FileOutputCommitter to solve the problem:

1. when initing a write job description, set
`stagingDir/.spark-staging-{jobId}` as the output dir
2. each task attempt writes output to
`stagingDir/.spark-staging-{jobId}/_temporary/${appAttemptId}/_temporary/${taskAttemptId}/{partitionId}/part-{taskId}-{jobId}{ext}`
3. leverage `FileOutputCommitter` coordinator, write job firstly commits
output to `stagingDir/.spark-staging-{jobId}/{partitionId}`
4. for dynamic partition overwrite, write job finally move
`stagingDir/.spark-staging-{jobId}/{partitionId}` to `finalPath/{partitionId}`

### Why are the changes needed?

Without this pr, dynamic partition overwrite would fail

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

added UT.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29243:
URL: https://github.com/apache/spark/pull/29243#issuecomment-664341924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29243:
URL: https://github.com/apache/spark/pull/29243#issuecomment-664341924







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664341339


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126634/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664341327


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664341327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29243: [SPARK-32444][SQL] Infer filters from DPP

2020-07-27 Thread GitBox



SparkQA commented on pull request #29243:
URL: https://github.com/apache/spark/pull/29243#issuecomment-664341321


   **[Test build #126641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126641/testReport)**
 for PR 29243 at commit 
[`bcc81be`](https://github.com/apache/spark/commit/bcc81be47ea0d8f04a3d508162d883bc57ecd68e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664212994


   **[Test build #126634 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126634/testReport)**
 for PR 29014 at commit 
[`c5edd23`](https://github.com/apache/spark/commit/c5edd2322e0da6941de6439c0e9321b188cfc2fc).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-27 Thread GitBox



SparkQA commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-664340538


   **[Test build #126634 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126634/testReport)**
 for PR 29014 at commit 
[`c5edd23`](https://github.com/apache/spark/commit/c5edd2322e0da6941de6439c0e9321b188cfc2fc).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class ExecutorProcessLost(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664338595







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-664338651







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664338595







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-664338651







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-27 Thread GitBox



SparkQA commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-664338026


   **[Test build #126640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126640/testReport)**
 for PR 28968 at commit 
[`a78fd43`](https://github.com/apache/spark/commit/a78fd4314ba39d1feb63ba1539ac9a2acf40de77).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



SparkQA commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664337984


   **[Test build #126639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126639/testReport)**
 for PR 29000 at commit 
[`e3dc26b`](https://github.com/apache/spark/commit/e3dc26ba0bf71371155fcaa227ac512ab987be98).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28968: [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode

2020-07-27 Thread GitBox



HyukjinKwon commented on pull request #28968:
URL: https://github.com/apache/spark/pull/28968#issuecomment-664337380


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664334637


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126626/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664334632


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664175993


   **[Test build #126626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126626/testReport)**
 for PR 28841 at commit 
[`11e1109`](https://github.com/apache/spark/commit/11e1109350fe5be67ff549fe94de1efd77735356).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664334632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox



SparkQA commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-664334191


   **[Test build #126626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126626/testReport)**
 for PR 28841 at commit 
[`11e1109`](https://github.com/apache/spark/commit/11e1109350fe5be67ff549fe94de1efd77735356).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29188:
URL: https://github.com/apache/spark/pull/29188#issuecomment-664332877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



HyukjinKwon closed pull request #29229:
URL: https://github.com/apache/spark/pull/29229


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29188:
URL: https://github.com/apache/spark/pull/29188#issuecomment-664332877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



HyukjinKwon commented on pull request #29229:
URL: https://github.com/apache/spark/pull/29229#issuecomment-664332698


   Merged to master.
   
   Thanks @dongjoon-hyun and @viirya 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29188:
URL: https://github.com/apache/spark/pull/29188#issuecomment-664179480


   **[Test build #126627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126627/testReport)**
 for PR 29188 at commit 
[`d6d0117`](https://github.com/apache/spark/commit/d6d011737aba77e7eca41c1b65d6588764856797).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29188: [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base

2020-07-27 Thread GitBox



SparkQA commented on pull request #29188:
URL: https://github.com/apache/spark/pull/29188#issuecomment-664331962


   **[Test build #126627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126627/testReport)**
 for PR 29188 at commit 
[`d6d0117`](https://github.com/apache/spark/commit/d6d011737aba77e7eca41c1b65d6588764856797).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664329566







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664247382


   **[Test build #126638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)**
 for PR 29257 at commit 
[`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664329566







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



SparkQA commented on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664329191


   **[Test build #126638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)**
 for PR 29257 at commit 
[`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664314873


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29259:
URL: https://github.com/apache/spark/pull/29259#issuecomment-664313746


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460805760



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {

Review comment:
   kind of need broadcastThreshold config
   or the following case will planed into SMJ
   
   ```
   // negative hand-written left anti join
 // testData.key nullable false
 // testData2.a nullable false
 joinExec = assertJoin((
   "select * from testData left anti join testData2 ON key = a or 
isnull(key = a)",
   classOf[BroadcastHashJoinExec]))
 
assert(!joinExec.asInstanceOf[BroadcastHashJoinExec].isNullAwareAntiJoin)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mundaym opened a new pull request #29259: [SPARK-29918][SQL][FOLLOWUP][TEST] Fix endianness issues in tests in RecordBinaryComparatorSuite

2020-07-27 Thread GitBox



mundaym opened a new pull request #29259:
URL: https://github.com/apache/spark/pull/29259


   ### What changes were proposed in this pull request?
   PR #26548 means that RecordBinaryComparator now uses big endian
   byte order for long comparisons. However, this means that some of
   the constants in the regression tests no longer map to the same
   values in the comparison that they used to.
   
   For example, one of the tests does a comparison between
   Long.MIN_VALUE and 1 in order to trigger an overflow condition that
   existed in the past (i.e. Long.MIN_VALUE - 1). These constants
   correspond to the values 0x80..00 and 0x00..01. However on a
   little-endian machine the bytes in these values are now swapped
   before they are compared. This means that we will now be comparing
   0x00..80 with 0x01..00. 0x00..80 - 0x01..00 does not overflow
   therefore missing the original purpose of the test.
   
   To fix this the constants are now explicitly written out in big
   endian byte order to match the byte order used in the comparison.
   This also fixes the tests on big endian machines (which would
   otherwise get a different comparison result to the little-endian
   machines).
   
   ### Why are the changes needed?
   The regression tests no longer serve their initial purposes and also fail on 
big-endian systems.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests run on big-endian system (s390x).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29255:
URL: https://github.com/apache/spark/pull/29255#issuecomment-664307630







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29255:
URL: https://github.com/apache/spark/pull/29255#issuecomment-664307630







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29255:
URL: https://github.com/apache/spark/pull/29255#issuecomment-664217047


   **[Test build #126635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126635/testReport)**
 for PR 29255 at commit 
[`5860f81`](https://github.com/apache/spark/commit/5860f81c701888db06de76315ea10c01209507f8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29255: [SPARK-32455][ML] LogisticRegressionModel prediction optimization

2020-07-27 Thread GitBox



SparkQA commented on pull request #29255:
URL: https://github.com/apache/spark/pull/29255#issuecomment-664306088


   **[Test build #126635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126635/testReport)**
 for PR 29255 at commit 
[`5860f81`](https://github.com/apache/spark/commit/5860f81c701888db06de76315ea10c01209507f8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29258:
URL: https://github.com/apache/spark/pull/29258#issuecomment-664302394


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29258:
URL: https://github.com/apache/spark/pull/29258#issuecomment-664304397


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29258:
URL: https://github.com/apache/spark/pull/29258#issuecomment-664302394


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mundaym opened a new pull request #29258: [SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads.

2020-07-27 Thread GitBox



mundaym opened a new pull request #29258:
URL: https://github.com/apache/spark/pull/29258


   ### What changes were proposed in this pull request?
   Updates to tests to use correctly sized `getInt` or `getLong` calls.
   
   ### Why are the changes needed?
   The reads were incorrectly sized (i.e. `putLong` paired with `getInt` and 
`putInt` paired with `getLong`). This causes test failures on big-endian 
systems.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Tests were run on a big-endian system (s390x). This change is unlikely to 
have any practical effect on little-endian systems.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29229:
URL: https://github.com/apache/spark/pull/29229#issuecomment-664290163







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29229:
URL: https://github.com/apache/spark/pull/29229#issuecomment-664175950


   **[Test build #126625 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126625/testReport)**
 for PR 29229 at commit 
[`d6ac35d`](https://github.com/apache/spark/commit/d6ac35dde690a60388c596b7ae52692b15b6ff76).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29229:
URL: https://github.com/apache/spark/pull/29229#issuecomment-664290163







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29229: [SPARK-32435][PYTHON] Remove heapq3 port from Python 3

2020-07-27 Thread GitBox



SparkQA commented on pull request #29229:
URL: https://github.com/apache/spark/pull/29229#issuecomment-664286909


   **[Test build #126625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126625/testReport)**
 for PR 29229 at commit 
[`d6ac35d`](https://github.com/apache/spark/commit/d6ac35dde690a60388c596b7ae52692b15b6ff76).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460795351



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * End-to-end test cases for subquery SQL queries coverage with
+ * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true.
+ *
+ * Each case is loaded from a file in
+ * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery".
+ * Each case has a golden result file in
+ * "spark/sql/core/src/test/resources/sql-tests/results/subquery".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite"
+ * }}}
+ *
+ */
+class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite {

Review comment:
   I will put config dimension in files in sql-tests if there are SQL 
include "NOT IN"





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664275587


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126629/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664275569


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664275569







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664186752


   **[Test build #126629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126629/testReport)**
 for PR 29000 at commit 
[`5865f51`](https://github.com/apache/spark/commit/5865f51181b235647afafa22c734db9987668f0b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-07-27 Thread GitBox



SparkQA commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-664273902


   **[Test build #126629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126629/testReport)**
 for PR 29000 at commit 
[`5865f51`](https://github.com/apache/spark/commit/5865f51181b235647afafa22c734db9987668f0b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r46016



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * End-to-end test cases for subquery SQL queries coverage with
+ * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true.
+ *
+ * Each case is loaded from a file in
+ * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery".
+ * Each case has a golden result file in
+ * "spark/sql/core/src/test/resources/sql-tests/results/subquery".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite"
+ * }}}
+ *
+ */
+class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite {

Review comment:
   since spark.sql.nullAwareAntiJoin.optimize.enabled is default true for 
now, is it ok to just delete this case?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460776658



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##
@@ -1646,4 +1647,96 @@ class SubquerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 checkAnswer(df, df2)
 checkAnswer(df, Nil)
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+Seq((true, true, true), (true, true, false), (true, false, true),

Review comment:
   Seq(true, false).foreach { enableNAAJ =>
 Seq(true, false).foreach { enableAQE =>
   ...
 }
   }





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460776658



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##
@@ -1646,4 +1647,96 @@ class SubquerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
 checkAnswer(df, df2)
 checkAnswer(df, Nil)
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+Seq((true, true, true), (true, true, false), (true, false, true),

Review comment:
   ```
   Seq(true, false).foreach { enableNAAJ =>
 Seq(true, false).foreach { enableAQE =>
   ...
 }
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460776654



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala
##
@@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with 
DisableAdaptiveExecutionSui
 
 val output = captured.toString()
 assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains(
-  """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, 
bigint, false])), [id=#x] ==
-|Tuples output: 0
-| id LongType: {}
-|== WholeStageCodegen (1) ==
-|Tuples output: 10
-| id LongType: {java.lang.Long}
-|== Range (0, 10, step=1, splits=2) ==
-|Tuples output: 0
-| id LongType: {}""".stripMargin))
+"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [id=#x] ==

Review comment:
   if I keep origin indentation, it will exceed 100 column in lines.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460776317



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala
##
@@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with 
DisableAdaptiveExecutionSui
 
 val output = captured.toString()
 assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains(
-  """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, 
bigint, false])), [id=#x] ==
-|Tuples output: 0
-| id LongType: {}
-|== WholeStageCodegen (1) ==
-|Tuples output: 10
-| id LongType: {java.lang.Long}
-|== Range (0, 10, step=1, splits=2) ==
-|Tuples output: 0
-| id LongType: {}""".stripMargin))
+"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [id=#x] ==

Review comment:
   how to skip a "\n" in triple line string?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460775483



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala
##
@@ -70,15 +70,15 @@ class DebuggingSuite extends SharedSparkSession with 
DisableAdaptiveExecutionSui
 
 val output = captured.toString()
 assert(output.replaceAll("\\[id=#\\d+\\]", "[id=#x]").contains(
-  """== BroadcastExchange HashedRelationBroadcastMode(List(input[0, 
bigint, false])), [id=#x] ==
-|Tuples output: 0
-| id LongType: {}
-|== WholeStageCodegen (1) ==
-|Tuples output: 10
-| id LongType: {java.lang.Long}
-|== Range (0, 10, step=1, splits=2) ==
-|Tuples output: 0
-| id LongType: {}""".stripMargin))
+"""== BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
false]),false), [id=#x] ==

Review comment:
   can you keep the indentation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460775092



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * End-to-end test cases for subquery SQL queries coverage with
+ * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true.
+ *
+ * Each case is loaded from a file in
+ * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery".
+ * Each case has a golden result file in
+ * "spark/sql/core/src/test/resources/sql-tests/results/subquery".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite"
+ * }}}
+ *
+ */
+class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite {

Review comment:
   Actually, very few queries can benefit from NAAJ. Can you find them out 
and test them via config dimensions?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664244178







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460774691



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {
+  // positive not in subquery case
+  assertJoin((
+"select * from testData where key not in (select a from testData2)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative not in subquery case since multi-column is not supported
+  assertJoin((
+"select * from testData where (key, key + 1) not in (select * from 
testData2)",
+classOf[BroadcastNestedLoopJoinExec]))
+
+  // positive hand-written left anti join
+  // testData.key nullable false
+  // testData3.b nullable true
+  assertJoin((
+"select * from testData left anti join testData3 ON key = b or 
isnull(key = b)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative hand-written left anti join

Review comment:
   both key has nullable = false, will remove the IsNull condition, and 
change it into a normal anti join, let me add some more assert on the 
isNullAware prop.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460774691



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {
+  // positive not in subquery case
+  assertJoin((
+"select * from testData where key not in (select a from testData2)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative not in subquery case since multi-column is not supported
+  assertJoin((
+"select * from testData where (key, key + 1) not in (select * from 
testData2)",
+classOf[BroadcastNestedLoopJoinExec]))
+
+  // positive hand-written left anti join
+  // testData.key nullable false
+  // testData3.b nullable true
+  assertJoin((
+"select * from testData left anti join testData3 ON key = b or 
isnull(key = b)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative hand-written left anti join

Review comment:
   both key has nullable = false, will remove the IsNull condition, and 
change it into a normal anti join, let me add some more assert on the 
isNullAware prop. hens it's no a NAAJ





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



SparkQA commented on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664247382


   **[Test build #126638 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126638/testReport)**
 for PR 29257 at commit 
[`b2dad7c`](https://github.com/apache/spark/commit/b2dad7c7cbda1198cce1719d8e2d457f7f741b09).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460774615



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/NullAwareAntiJoinSQLQueryTestSuite.scala
##
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * End-to-end test cases for subquery SQL queries coverage with
+ * NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED = true.
+ *
+ * Each case is loaded from a file in
+ * "spark/sql/core/src/test/resources/sql-tests/inputs/subquery".
+ * Each case has a golden result file in
+ * "spark/sql/core/src/test/resources/sql-tests/results/subquery".
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *NullAwareAntiJoinSQLQueryTestSuite"
+ * }}}
+ *
+ */
+class NullAwareAntiJoinSQLQueryTestSuite extends SQLQueryTestSuite {

Review comment:
   It's super over-kill to rerun the entire `SQLQueryTestSuite` with NAAJ 
enabled.
   
   Let's find out the join test files, and use config dimension to test NAAJ. 
e.g. `join.sql`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460773562



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {
+  // positive not in subquery case
+  assertJoin((
+"select * from testData where key not in (select a from testData2)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative not in subquery case since multi-column is not supported
+  assertJoin((
+"select * from testData where (key, key + 1) not in (select * from 
testData2)",
+classOf[BroadcastNestedLoopJoinExec]))
+
+  // positive hand-written left anti join
+  // testData.key nullable false
+  // testData3.b nullable true
+  assertJoin((
+"select * from testData left anti join testData3 ON key = b or 
isnull(key = b)",
+classOf[BroadcastHashJoinExec]))
+
+  // negative hand-written left anti join

Review comment:
   is it negative? We do produce `BroadcastHashJoinExec` here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460773428



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {

Review comment:
   will remove it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460772792



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -903,15 +926,65 @@ private[joins] object LongHashedRelation {
   if (!rowKey.isNullAt(0)) {
 val key = rowKey.getLong(0)
 map.append(key, unsafeRow)
+  } else if (isNullAware) {
+return new EmptyHashedRelationWithAllNullKeys
   }
 }
 map.optimize()
 new LongHashedRelation(numFields, map)
   }
 }
 
+/**
+ * A special HashedRelation indicates it built from a empty 
input:Iterator[InternalRow].
+ */
+class EmptyHashedRelation extends HashedRelation with Externalizable {

Review comment:
   Ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460772732



##
File path: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
##
@@ -1147,4 +1147,40 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   })
 }
   }
+
+  test("SPARK-32290: SingleColumn Null Aware Anti Join Optimize") {
+withSQLConf(SQLConf.NULL_AWARE_ANTI_JOIN_OPTIMIZE_ENABLED.key -> "true",
+  SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) {

Review comment:
   do we need to set this config?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



HyukjinKwon commented on a change in pull request #29241:
URL: https://github.com/apache/spark/pull/29241#discussion_r460682607



##
File path: core/src/main/scala/org/apache/spark/TestUtils.scala
##
@@ -236,7 +236,11 @@ private[spark] object TestUtils {
* Test if a command is available.
*/
   def testCommandAvailable(command: String): Boolean = {
-val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
+val attempt = if (Utils.isWindows) {
+  Try(Process(command).run(ProcessLogger(_ => ())).exitValue())

Review comment:
   ```suggestion
 Try(Process(s"WHERE $command").run(ProcessLogger(_ => ())).exitValue())
   ```

##
File path: core/src/main/scala/org/apache/spark/TestUtils.scala
##
@@ -236,7 +236,11 @@ private[spark] object TestUtils {
* Test if a command is available.
*/
   def testCommandAvailable(command: String): Boolean = {
-val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
+val attempt = if (Utils.isWindows) {
+  Try(Process(command).run(ProcessLogger(_ => ())).exitValue())

Review comment:
   @dongjoon-hyun, I manually tested. `command` -> `$"WHERE $command"` 
seems working fine in Windows with Scala 2.13.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460772228



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -311,6 +314,15 @@ private[joins] object UnsafeHashedRelation {
   key: Seq[Expression],
   sizeEstimate: Int,
   taskMemoryManager: TaskMemoryManager): HashedRelation = {
+apply(input, key, sizeEstimate, taskMemoryManager, isNullAware = false)
+  }
+
+  def apply(
+  input: Iterator[InternalRow],
+  key: Seq[Expression],
+  sizeEstimate: Int,
+  taskMemoryManager: TaskMemoryManager,
+  isNullAware: Boolean = false): HashedRelation = {

Review comment:
   let me remove them and see if there are build issues. if not , I will 
remove them for good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29204:
URL: https://github.com/apache/spark/pull/29204#issuecomment-664244632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460771904



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -903,15 +926,65 @@ private[joins] object LongHashedRelation {
   if (!rowKey.isNullAt(0)) {
 val key = rowKey.getLong(0)
 map.append(key, unsafeRow)
+  } else if (isNullAware) {
+return new EmptyHashedRelationWithAllNullKeys
   }
 }
 map.optimize()
 new LongHashedRelation(numFields, map)
   }
 }
 
+/**
+ * A special HashedRelation indicates it built from a empty 
input:Iterator[InternalRow].
+ */
+class EmptyHashedRelation extends HashedRelation with Externalizable {

Review comment:
   shall we have a common trait for these 2 to contain the fake 
implementation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29204:
URL: https://github.com/apache/spark/pull/29204#issuecomment-664244632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29204:
URL: https://github.com/apache/spark/pull/29204#issuecomment-664212962


   **[Test build #126633 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126633/testReport)**
 for PR 29204 at commit 
[`5011314`](https://github.com/apache/spark/commit/50113145de4d9c7247d2a8af6e1e4f1087d19548).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29204: [SPARK-32412][SQL] Unify error handling for spark thrift server operations

2020-07-27 Thread GitBox



SparkQA commented on pull request #29204:
URL: https://github.com/apache/spark/pull/29204#issuecomment-664244271


   **[Test build #126633 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126633/testReport)**
 for PR 29204 at commit 
[`5011314`](https://github.com/apache/spark/commit/50113145de4d9c7247d2a8af6e1e4f1087d19548).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



HyukjinKwon commented on a change in pull request #29241:
URL: https://github.com/apache/spark/pull/29241#discussion_r460689868



##
File path: core/src/main/scala/org/apache/spark/TestUtils.scala
##
@@ -236,7 +236,11 @@ private[spark] object TestUtils {
* Test if a command is available.
*/
   def testCommandAvailable(command: String): Boolean = {
-val attempt = Try(Process(command).run(ProcessLogger(_ => ())).exitValue())
+val attempt = if (Utils.isWindows) {
+  Try(Process(s"WHERE $command").run(ProcessLogger(_ => ())).exitValue())

Review comment:
   I run some tests via AppVeyor manually to show it works in Windows:
   
   Build started: [CORE] `org.apache.spark.rdd.PipedRDDSuite` 
[![PR-29241](https://ci.appveyor.com/api/projects/status/github/HyukjinKwon/spark?branch=5D8964EE-6046-4DBA-91AD-53B2E5EE4500=true)](https://ci.appveyor.com/project/HyukjinKwon/spark/branch/5D8964EE-6046-4DBA-91AD-53B2E5EE4500)
   
   Build started: [SQL] `org.apache.spark.sql.hive.execution.SQLQuerySuite` 
[![PR-29241](https://ci.appveyor.com/api/projects/status/github/HyukjinKwon/spark?branch=C22F6129-6D78-40AF-BE21-1A625BC656A1=true)](https://ci.appveyor.com/project/HyukjinKwon/spark/branch/C22F6129-6D78-40AF-BE21-1A625BC656A1)
   
   Some of tests might already fail. We can just see if `testCommandAvailable` 
works or not.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29257:
URL: https://github.com/apache/spark/pull/29257#issuecomment-664244178







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460768470



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -454,6 +490,43 @@ case class BroadcastHashJoinExec(
 val (matched, checkCondition, _) = getJoinCondition(ctx, input)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
+if (isNullAwareAntiJoin) {
+  if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) {
+return s"""
+  |// NAAJ Join EmptyHashedRelation accept all
+  |$numOutput.add(1);
+  |${consume(ctx, input)}
+""".stripMargin
+  } else if 
(broadcastRelation.value.isInstanceOf[EmptyHashedRelationWithAllNullKeys]) {
+return s"""
+  |// NAAJ
+  |// EmptyHashedRelationWithAllNullKeys
+  |// reject all

Review comment:
   `// If the right side contains a key that all columns are null, NAAJ 
simply returns Nil.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460770609



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -311,6 +314,15 @@ private[joins] object UnsafeHashedRelation {
   key: Seq[Expression],
   sizeEstimate: Int,
   taskMemoryManager: TaskMemoryManager): HashedRelation = {
+apply(input, key, sizeEstimate, taskMemoryManager, isNullAware = false)
+  }
+
+  def apply(
+  input: Iterator[InternalRow],
+  key: Seq[Expression],
+  sizeEstimate: Int,
+  taskMemoryManager: TaskMemoryManager,
+  isNullAware: Boolean = false): HashedRelation = {

Review comment:
   do we really need 2 `apply` methods as we have default parameter value 
here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460770684



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -889,7 +903,16 @@ private[joins] object LongHashedRelation {
   input: Iterator[InternalRow],
   key: Seq[Expression],
   sizeEstimate: Int,
-  taskMemoryManager: TaskMemoryManager): LongHashedRelation = {
+  taskMemoryManager: TaskMemoryManager): HashedRelation = {
+apply(input, key, sizeEstimate, taskMemoryManager, false)
+  }
+
+  def apply(
+  input: Iterator[InternalRow],
+  key: Seq[Expression],
+  sizeEstimate: Int,
+  taskMemoryManager: TaskMemoryManager,
+  isNullAware: Boolean = false): HashedRelation = {

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng opened a new pull request #29257: [SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP

2020-07-27 Thread GitBox



zhengruifeng opened a new pull request #29257:
URL: https://github.com/apache/spark/pull/29257


   ### What changes were proposed in this pull request?
   logParam `thresholds` in DT/GBT/FM/LR/MLP
   
   
   ### Why are the changes needed?
   param `thresholds` is logged in NB/RF, but not in other 
ProbabilisticClassifier
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   existing testsuites



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460768470



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -454,6 +490,43 @@ case class BroadcastHashJoinExec(
 val (matched, checkCondition, _) = getJoinCondition(ctx, input)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
+if (isNullAwareAntiJoin) {
+  if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) {
+return s"""
+  |// NAAJ Join EmptyHashedRelation accept all
+  |$numOutput.add(1);
+  |${consume(ctx, input)}
+""".stripMargin
+  } else if 
(broadcastRelation.value.isInstanceOf[EmptyHashedRelationWithAllNullKeys]) {
+return s"""
+  |// NAAJ
+  |// EmptyHashedRelationWithAllNullKeys
+  |// reject all

Review comment:
   `// If the right side contains any all-null key, NAAJ simply returns 
Nil.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460767934



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -454,6 +490,43 @@ case class BroadcastHashJoinExec(
 val (matched, checkCondition, _) = getJoinCondition(ctx, input)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
+if (isNullAwareAntiJoin) {
+  if (broadcastRelation.value.isInstanceOf[EmptyHashedRelation]) {
+return s"""
+  |// NAAJ Join EmptyHashedRelation accept all

Review comment:
   `NAAJ` already means Join. How about `If the right side is empty, NAAJ 
simply returns the left side.`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29256:
URL: https://github.com/apache/spark/pull/29256#issuecomment-664240371







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29241:
URL: https://github.com/apache/spark/pull/29241#issuecomment-664239449


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126623/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29256:
URL: https://github.com/apache/spark/pull/29256#issuecomment-664240371







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark

2020-07-27 Thread GitBox



SparkQA commented on pull request #29256:
URL: https://github.com/apache/spark/pull/29256#issuecomment-664239733


   **[Test build #126637 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126637/testReport)**
 for PR 29256 at commit 
[`66e1f52`](https://github.com/apache/spark/commit/66e1f52197f0157b91017327155b498b9ce05688).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29241:
URL: https://github.com/apache/spark/pull/29241#issuecomment-664239439


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29241:
URL: https://github.com/apache/spark/pull/29241#issuecomment-664239439







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #29256: [SPARK-32456][SS] Give better error message for union streams in append mode that don't have a watermark

2020-07-27 Thread GitBox

xuanyuanking opened a new pull request #29256:
URL: https://github.com/apache/spark/pull/29256

### What changes were proposed in this pull request?
Check the Distinct nodes by assuming it as Aggregate in
`UnsupportOperationChecker` for streaming.

### Why are the changes needed?
Since the union clause in SQL has the requirement of deduplication, the
parser will generate `Distinct(Union)` and the optimizer rule
`ReplaceDistinctWithAggregate` will change it to `Aggregate(Union)`. This logic
is of both batch and streaming queries. However, in the streaming, the
aggregation will be wrapped by state store operations.

Before this change, the SS union queries in Append mode will get the
following confusing error when the watermark is lacking.
```
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at
org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:346)
at
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:561)
at
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:112)
...
```

### Does this PR introduce _any_ user-facing change?
Yes, return a better error message.

### How was this patch tested?
New UT added.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



SparkQA commented on pull request #29241:
URL: https://github.com/apache/spark/pull/29241#issuecomment-664238519


   **[Test build #126623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126623/testReport)**
 for PR 29241 at commit 
[`d239ede`](https://github.com/apache/spark/commit/d239eded8f854a1d0fc30bac12f1f5c080af02eb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29241: [SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandAvailable

2020-07-27 Thread GitBox



SparkQA removed a comment on pull request #29241:
URL: https://github.com/apache/spark/pull/29241#issuecomment-664166354


   **[Test build #126623 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126623/testReport)**
 for PR 29241 at commit 
[`d239ede`](https://github.com/apache/spark/commit/d239eded8f854a1d0fc30bac12f1f5c080af02eb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-664237007


   @cloud-fan updated. with your suggestion, hashedRelation code diff is 
smaller and making more sense.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



AmplabJenkins removed a comment on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-664236317







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



AmplabJenkins commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-664236317







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] leanken commented on a change in pull request #29104: [SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize

2020-07-27 Thread GitBox



leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r460762568



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -896,22 +967,29 @@ private[joins] object LongHashedRelation {
 
 // Create a mapping of key -> rows
 var numFields = 0
+val isOriginalInputEmpty: Boolean = !input.hasNext
+var allNullColumnKeyExistsInOriginalInput: Boolean = false
 while (input.hasNext) {
   val unsafeRow = input.next().asInstanceOf[UnsafeRow]
   numFields = unsafeRow.numFields()
   val rowKey = keyGenerator(unsafeRow)
   if (!rowKey.isNullAt(0)) {
 val key = rowKey.getLong(0)
 map.append(key, unsafeRow)
+  } else if (!allNullColumnKeyExistsInOriginalInput) {
+allNullColumnKeyExistsInOriginalInput = true
   }
 }
 map.optimize()
 new LongHashedRelation(numFields, map)
+  .setOriginalInputEmtpy(isOriginalInputEmpty)
+  
.setAllNullColumnKeyExistsInOriginalInput(allNullColumnKeyExistsInOriginalInput)
+  .asInstanceOf[LongHashedRelation]
   }
 }
 
 /** The HashedRelationBroadcastMode requires that rows are broadcasted as a 
HashedRelation. */
-case class HashedRelationBroadcastMode(key: Seq[Expression])
+case class HashedRelationBroadcastMode(key: Seq[Expression], isNullAware: 
Boolean = false)

Review comment:
   done

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -323,11 +373,19 @@ private[joins] object UnsafeHashedRelation {
 // Create a mapping of buildKeys -> rows
 val keyGenerator = UnsafeProjection.create(key)
 var numFields = 0
+val isOriginalInputEmpty = !input.hasNext
+var allNullColumnKeyExistsInOriginalInput: Boolean = false
 while (input.hasNext) {

Review comment:
   done

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
##
@@ -896,22 +967,29 @@ private[joins] object LongHashedRelation {
 
 // Create a mapping of key -> rows
 var numFields = 0
+val isOriginalInputEmpty: Boolean = !input.hasNext
+var allNullColumnKeyExistsInOriginalInput: Boolean = false
 while (input.hasNext) {

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 301 matches

Mail list logo