[GitHub] [spark] MaxGekk closed pull request #33551: [SPARK-36323][SQL] Support ANSI interval literals for TimeWindow

2021-07-28 Thread GitBox


MaxGekk closed pull request #33551:
URL: https://github.com/apache/spark/pull/33551


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


SparkQA commented on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-21901


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46318/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-888773347


   **[Test build #141803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141803/testReport)**
 for PR 33535 at commit 
[`b1f97b0`](https://github.com/apache/spark/commit/b1f97b060e5b800e00501a1fe675ed3189406828).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-07-28 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-20990


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46319/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


SparkQA commented on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-20806


   **[Test build #141803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141803/testReport)**
 for PR 33535 at commit 
[`b1f97b0`](https://github.com/apache/spark/commit/b1f97b060e5b800e00501a1fe675ed3189406828).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33569: [SPARK-35806][PYTHON][FOLLOW-UP] Mapping the mode argument to pandas in DataFrame.to_csv

2021-07-28 Thread GitBox


SparkQA commented on pull request #33569:
URL: https://github.com/apache/spark/pull/33569#issuecomment-17603


   **[Test build #141809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141809/testReport)**
 for PR 33569 at commit 
[`42a4251`](https://github.com/apache/spark/commit/42a42510d1e3a2b68a148fa8787868d70caca2b7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-17046


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-17046


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141804/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


SparkQA commented on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-16903


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46320/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request #33569: [SPARK-35806][PYTHON][FOLLOW-UP] Mapping the mode argument to pandas in DataFrame.to_csv

2021-07-28 Thread GitBox


itholic opened a new pull request #33569:
URL: https://github.com/apache/spark/pull/33569


   ### What changes were proposed in this pull request?
   
   This PR is follow-up for https://github.com/apache/spark/pull/33414 to match 
the `mode` argument for all APIs that has `mode` argument, not only 
`DataFrame.to_csv`.
   
   ### Why are the changes needed?
   
   To keep the usage consistency for the arguments that have same name.
   
   ### Does this PR introduce _any_ user-facing change?
   
   More options is available for all APIs that has `mode` argument, same as 
`DataFrame.to_csv`
   
   ### How was this patch tested?
   
   Manually test on local


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-888773387


   **[Test build #141804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141804/testReport)**
 for PR 33451 at commit 
[`bf3643d`](https://github.com/apache/spark/commit/bf3643d47da70b06b7f70d8af69e41e26cfd0fe0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


SparkQA commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-16444


   **[Test build #141804 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141804/testReport)**
 for PR 33451 at commit 
[`bf3643d`](https://github.com/apache/spark/commit/bf3643d47da70b06b7f70d8af69e41e26cfd0fe0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-15656


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


SparkQA commented on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-15641


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-15656


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33568: [SPARK-36335][DOCS] Remove Local-cluster mode reference (and add a missing period)

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33568:
URL: https://github.com/apache/spark/pull/33568#issuecomment-15320


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yutoacts opened a new pull request #33568: [SPARK-36335][DOCS] Remove Local-cluster mode reference (and add a missing period)

2021-07-28 Thread GitBox


yutoacts opened a new pull request #33568:
URL: https://github.com/apache/spark/pull/33568


   
   
   ### What changes were proposed in this pull request?
   
   Remove local-cluster mode reference from configuration.md and add a missing 
period to submitting-application.md.
   
   ### Why are the changes needed?
   
   It's a test-only mode for developers so we should remove it to avoid 
confusing users.
   I'll add local-cluster mode documentation to developer tools page 
(https://spark.apache.org/developer-tools.html) soon.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, docs changed.
   
   ### How was this patch tested?
   
   `SKIP_API=1 bundle exec jekyll build`
   https://user-images.githubusercontent.com/87687356/127436725-ed2486b9-b619-4a74-80ee-0f11335aa12d.png;>
   https://user-images.githubusercontent.com/87687356/127436737-c1eecf39-c2b0-45b3-9837-f9482ec798cd.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33404: [SPARK-36194][SQL] Remove the aggregation from left semi/anti join if the same aggregation has already been done on left side

2021-07-28 Thread GitBox


SparkQA commented on pull request #33404:
URL: https://github.com/apache/spark/pull/33404#issuecomment-14928


   **[Test build #141808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141808/testReport)**
 for PR 33404 at commit 
[`eb71b8a`](https://github.com/apache/spark/commit/eb71b8ae8de1fb737eea170e920c24127bcc2b95).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-14384


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-14385


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46315/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33566:
URL: https://github.com/apache/spark/pull/33566#issuecomment-14387


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-14383


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33556: [SPARK-36324][CORE] Replace revertPartialWritesAndClose with close in ExternalSorter.spill and ExternalAppendOnlyMap.spill

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-14386


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46314/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33556: [SPARK-36324][CORE] Replace revertPartialWritesAndClose with close in ExternalSorter.spill and ExternalAppendOnlyMap.spill

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-14386


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46314/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-14385


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46315/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33566:
URL: https://github.com/apache/spark/pull/33566#issuecomment-14387


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-14384


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141807/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-14383


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141805/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on pull request #33034: [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

2021-07-28 Thread GitBox


venkata91 commented on pull request #33034:
URL: https://github.com/apache/spark/pull/33034#issuecomment-13035


   @Ngone51 @mridulm Addressed all the review comments, please take a look 
again. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-888796409


   **[Test build #141807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141807/testReport)**
 for PR 33567 at commit 
[`485dd90`](https://github.com/apache/spark/commit/485dd905d4ff11e98182881dc10795f03d62a1a9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


SparkQA commented on pull request #33566:
URL: https://github.com/apache/spark/pull/33566#issuecomment-09469


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


SparkQA commented on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-08470


   **[Test build #141807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141807/testReport)**
 for PR 33567 at commit 
[`485dd90`](https://github.com/apache/spark/commit/485dd905d4ff11e98182881dc10795f03d62a1a9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public class TimerWithCustomTimeUnit extends Timer `
 * `public class RetryingBlockTransferor `
 * `public final class Aggregation implements Serializable `
 * `public final class Count implements AggregateFunc `
 * `public final class CountStar implements AggregateFunc `
 * `public final class Max implements AggregateFunc `
 * `public final class Min implements AggregateFunc `
 * `public final class Sum implements AggregateFunc `
 * `trait AlterTableColumnCommand extends UnaryCommand `
 * `case class AlterTableAddColumns(`
 * `case class AlterTableReplaceColumns(`
 * `case class CoalescedMapperPartitionSpec(`
 * `trait AQEShuffleReadRule extends Rule[SparkPlan] `
 * `case class CoalesceShufflePartitions(session: SparkSession) extends 
AQEShuffleReadRule `
 * `class BasicWriteTaskStatsTracker(`
 * `case class ScanBuilderHolder(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33556: [SPARK-36324][CORE] Replace revertPartialWritesAndClose with close in ExternalSorter.spill and ExternalAppendOnlyMap.spill

2021-07-28 Thread GitBox


SparkQA commented on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-06299


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46314/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


SparkQA commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-00754


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46315/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #33562: [SPARK-36333][PYTHON] Reuse isnull where the null check is needed

2021-07-28 Thread GitBox


itholic commented on a change in pull request #33562:
URL: https://github.com/apache/spark/pull/33562#discussion_r678749415



##
File path: python/pyspark/pandas/generic.py
##
@@ -3180,6 +3180,7 @@ def __bool__(self) -> NoReturn:
 def _count_expr(spark_column: Column, spark_type: DataType) -> Column:
 # Special handle floating point types because Spark's count treats nan 
as a valid value,
 # whereas pandas count doesn't include nan.
+# TODO: Make this work with DataTypeOps.

Review comment:
   Maybe we want to create a related ticket for TODO and comment with 
ticket number ?
   
   e.g. `TODO(SPARK-x): Make this work with DataTypeOps.`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
  if (!effectiveConfig.containsKey(key)) {
effectiveConfig.put(key, p.getProperty(key));
   }
   }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   But the first will call `p.getProperty(key)` for each key, which is 
different from the original logic.
   
   And The second has the problem of ignore the return value of 
`computeIfAbsent` method.
   
   Which way do you prefer or keep the original logic? I prefer to use 
`putIfAbsent`
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-888792663


   **[Test build #141805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141805/testReport)**
 for PR 33563 at commit 
[`b9898f9`](https://github.com/apache/spark/commit/b9898f9a2e5644bbde128c944d1be3c521943b93).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


SparkQA commented on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-888798434


   **[Test build #141805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141805/testReport)**
 for PR 33563 at commit 
[`b9898f9`](https://github.com/apache/spark/commit/b9898f9a2e5644bbde128c944d1be3c521943b93).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-888796757


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-888796757


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


SparkQA commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-888796737


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46317/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


SparkQA commented on pull request #33567:
URL: https://github.com/apache/spark/pull/33567#issuecomment-888796409


   **[Test build #141807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141807/testReport)**
 for PR 33567 at commit 
[`485dd90`](https://github.com/apache/spark/commit/485dd905d4ff11e98182881dc10795f03d62a1a9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request #33567: [SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI

2021-07-28 Thread GitBox


itholic opened a new pull request #33567:
URL: https://github.com/apache/spark/pull/33567


   ### What changes were proposed in this pull request?
   
   This PR proposes adding a Python package, `mlflow` and `sklearn` to enable 
the MLflow test in pandas API on Spark.
   
   
   ### Why are the changes needed?
   
   To enable the MLflow test in pandas API on Spark.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. it's test-only
   
   
   ### How was this patch tested?
   
   Manually test on local, with `python/run-tests --testnames 
pyspark.pandas.mlflow`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #33413: [SPARK-36175][SQL] Support TimestampNTZ in Avro data source

2021-07-28 Thread GitBox


cloud-fan commented on a change in pull request #33413:
URL: https://github.com/apache/spark/pull/33413#discussion_r678816260



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
##
@@ -147,6 +147,21 @@ private[sql] class AvroDeserializer(
   s"Avro logical type $other cannot be converted to SQL type 
${TimestampType.sql}.")
   }
 
+  case (LONG, TimestampNTZType) => avroType.getLogicalType match {
+// For backward compatibility, if the Avro type is Long and it is not 
logical type
+// (the `null` case), the value is processed as timestamp without time 
zone type
+// with millisecond precision.
+case null | _: LocalTimestampMillis => (updater, ordinal, value) =>
+  val millis = value.asInstanceOf[Long]
+  val micros = DateTimeUtils.millisToMicros(millis)
+  updater.setLong(ordinal, timestampRebaseFunc(micros))

Review comment:
   It goes without saying that a modern system always uses Proleptic 
Gregorian calendar. The rebase is only used to keep backward compatibility with 
legacy Spark versions. This is a new data type and there is no backward 
compatibility issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


SparkQA commented on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-888793444


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46316/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33556: [WIP][SPARK-36324][CORE] Replace revertPartialWritesAndClose with close in ExternalSorter.spill and ExternalAppendOnlyMap.spill

2021-07-28 Thread GitBox


SparkQA commented on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-888793226


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46314/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-07-28 Thread GitBox


SparkQA commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-888793249


   **[Test build #141806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141806/testReport)**
 for PR 31517 at commit 
[`6c74fc6`](https://github.com/apache/spark/commit/6c74fc6007f1ef71918c79f4eee9196ada65604e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888792850


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46311/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


SparkQA commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888792829


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46311/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888792850


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46311/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33563: [SPARK-36334][K8S] Add a new conf to allow K8s API server-side caching for pod listing

2021-07-28 Thread GitBox


SparkQA commented on pull request #33563:
URL: https://github.com/apache/spark/pull/33563#issuecomment-888792663


   **[Test build #141805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141805/testReport)**
 for PR 33563 at commit 
[`b9898f9`](https://github.com/apache/spark/commit/b9898f9a2e5644bbde128c944d1be3c521943b93).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33494:
URL: https://github.com/apache/spark/pull/33494#issuecomment-888791833


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141795/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33541:
URL: https://github.com/apache/spark/pull/33541#issuecomment-888791837


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888791835


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141802/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33544: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33544:
URL: https://github.com/apache/spark/pull/33544#issuecomment-888791842


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33494:
URL: https://github.com/apache/spark/pull/33494#issuecomment-888791833


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141795/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33541:
URL: https://github.com/apache/spark/pull/33541#issuecomment-888791837


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/46312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888791835


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141802/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


SparkQA commented on pull request #33566:
URL: https://github.com/apache/spark/pull/33566#issuecomment-888789675


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46313/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33494:
URL: https://github.com/apache/spark/pull/33494#issuecomment-888690742


   **[Test build #141795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141795/testReport)**
 for PR 33494 at commit 
[`cb9230f`](https://github.com/apache/spark/commit/cb9230f7d54b173aa5403d3559dc74c9721b50bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


SparkQA commented on pull request #33494:
URL: https://github.com/apache/spark/pull/33494#issuecomment-888789018


   **[Test build #141795 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141795/testReport)**
 for PR 33494 at commit 
[`cb9230f`](https://github.com/apache/spark/commit/cb9230f7d54b173aa5403d3559dc74c9721b50bd).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


SparkQA commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888787998


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46315/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


SparkQA commented on pull request #33541:
URL: https://github.com/apache/spark/pull/33541#issuecomment-888782025


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #31517: [SPARK-34309][BUILD][CORE][SQL][K8S]Use Caffeine instead of Guava Cache

2021-07-28 Thread GitBox


LuciferYang commented on pull request #31517:
URL: https://github.com/apache/spark/pull/31517#issuecomment-888779189


   @holdenk 6c74fc6 merge with master 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888773302


   **[Test build #141802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141802/testReport)**
 for PR 33550 at commit 
[`e47cd50`](https://github.com/apache/spark/commit/e47cd501da71d55c6bd6274057593121fa1782e6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


SparkQA commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888778872


   **[Test build #141802 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141802/testReport)**
 for PR 33550 at commit 
[`e47cd50`](https://github.com/apache/spark/commit/e47cd501da71d55c6bd6274057593121fa1782e6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
  if (!effectiveConfig.containsKey(key)) {
effectiveConfig.put(key, p.getProperty(key));
   }
   }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   But the first will call `p.getProperty(key)` for each key, which is 
different from the original logic.
   
   And The second has the problem of ignore the return value of 
`computeIfAbsent` method.
   
   Which way do you prefer or keep the original logic?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
  if (!effectiveConfig.containsKey(key)) {
effectiveConfig.put(key, p.getProperty(key));
   }
   }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   But the first will call `p.getProperty(key)` for each key, which is 
different from the original logic.
   
   And The second has the problem of ignoring the return value of 
`computeIfAbsent` method.
   
   Which way do you prefer or keep the original logic?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean to refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
  if (!effectiveConfig.containsKey(key)) {
effectiveConfig.put(key, p.getProperty(key));
   }
   }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   But the first will call `p.getProperty(key)` for each key, which is 
different from the original logic.
   
   And The second has the problem of ignoring the return value of 
`computeIfAbsent` method.
   
   Which way do you prefer or keep the original logic?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean to refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   But the first will call `p.getProperty(key)` for each key, which is 
different from the original logic.
   
   And The second has the problem of ignoring the return value of 
`computeIfAbsent` method.
   
   Which way do you prefer or keep the original logic?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   @srowen do you mean to refactor the 
   
   ```java
   for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   ```
   
   to 
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.putIfAbsent(key, 
p.getProperty(key)));
   ```
   
   or refactor to
   
   ```
   p.stringPropertyNames().forEach(key -> effectiveConfig.computeIfAbsent(key, 
p::getProperty));
   ```
   
   Both can pass the test. 
   
   The first will call `p.getProperty(key)` for each key, which is different 
from the original logic.
   
   The second has the problem of ignoring the return value of `computeIfAbsent` 
method.
   
   Which way do you prefer or keep the original logic?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


SparkQA commented on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888774369


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46311/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33535: [SPARK-36108][SQL] Refactor first set of 20 query parsing errors to use error classes

2021-07-28 Thread GitBox


SparkQA commented on pull request #33535:
URL: https://github.com/apache/spark/pull/33535#issuecomment-888773347


   **[Test build #141803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141803/testReport)**
 for PR 33535 at commit 
[`b1f97b0`](https://github.com/apache/spark/commit/b1f97b060e5b800e00501a1fe675ed3189406828).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


SparkQA commented on pull request #33451:
URL: https://github.com/apache/spark/pull/33451#issuecomment-888773387


   **[Test build #141804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141804/testReport)**
 for PR 33451 at commit 
[`bf3643d`](https://github.com/apache/spark/commit/bf3643d47da70b06b7f70d8af69e41e26cfd0fe0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


SparkQA commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888773302


   **[Test build #141802 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141802/testReport)**
 for PR 33550 at commit 
[`e47cd50`](https://github.com/apache/spark/commit/e47cd501da71d55c6bd6274057593121fa1782e6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


SparkQA commented on pull request #33566:
URL: https://github.com/apache/spark/pull/33566#issuecomment-888773286


   **[Test build #141800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141800/testReport)**
 for PR 33566 at commit 
[`33e9031`](https://github.com/apache/spark/commit/33e90312744f79e239e0618b8f6d4c2fe4e142e7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33556: [WIP][SPARK-36324][CORE] Replace revertPartialWritesAndClose with close in ExternalSorter.spill and ExternalAppendOnlyMap.spill

2021-07-28 Thread GitBox


SparkQA commented on pull request #33556:
URL: https://github.com/apache/spark/pull/33556#issuecomment-888773263


   **[Test build #141801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141801/testReport)**
 for PR 33556 at commit 
[`ce66fc6`](https://github.com/apache/spark/commit/ce66fc6622be807585673612a34f0df706d33e47).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


Ngone51 commented on a change in pull request #33451:
URL: https://github.com/apache/spark/pull/33451#discussion_r678795780



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/corruption/Cause.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.corruption;
+
+/**
+ * The cause of shuffle data corruption.
+ *
+ * @since 3.2.0

Review comment:
   I have moved it to `spark-network-shuffle` module and removed the since 
tag. cc @mridulm 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33565: [SPARK-33298][CORE][FOLLOWUP] Move `FileNameSpec` into `FileCommitProtocol` object

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33565:
URL: https://github.com/apache/spark/pull/33565#issuecomment-888772193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141796/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33545: [SPARK-36319][SQL][PySpark] Have Observation return Map instead of Row

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33545:
URL: https://github.com/apache/spark/pull/33545#issuecomment-888772199


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #33565: [SPARK-33298][CORE][FOLLOWUP] Move `FileNameSpec` into `FileCommitProtocol` object

2021-07-28 Thread GitBox


AmplabJenkins commented on pull request #33565:
URL: https://github.com/apache/spark/pull/33565#issuecomment-888772193


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141796/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang commented on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888771778


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   The original method as follows:
   
   ```java
 Map getEffectiveConfig() throws IOException {
   if (effectiveConfig == null) {
 effectiveConfig = new HashMap<>(conf);
 Properties p = loadPropertiesFile();
 for (String key : p.stringPropertyNames()) {
   if (!effectiveConfig.containsKey(key)) {
 effectiveConfig.put(key, p.getProperty(key));
   }
 }
   }
   return effectiveConfig;
 }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


SparkQA commented on pull request #33541:
URL: https://github.com/apache/spark/pull/33541#issuecomment-888770887


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46312/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on a change in pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


maryannxue commented on a change in pull request #33541:
URL: https://github.com/apache/spark/pull/33541#discussion_r678793485



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##
@@ -32,8 +32,14 @@ import 
org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, SortMergeJoin
  * [[org.apache.spark.sql.catalyst.plans.physical.Distribution Distribution]] 
requirements for
  * each operator by inserting [[ShuffleExchangeExec]] Operators where 
required.  Also ensure that
  * the input partition ordering requirements are met.
+ *
+ * @param optimizeOutRepartitionByCol A flag to indicate that if this rule 
should optimize out
+ *user-specified repartition-by-col 
shuffles or not. This is
+ *mostly true, but can be false in AQE 
when AQE optimization
+ *may change the plan output partitioning 
and need to retain
+ *the user-specified repartition-by-col 
shuffles in the plan.
  */
-object EnsureRequirements extends Rule[SparkPlan] {
+case class EnsureRequirements(optimizeOutRepartitionByCol: Boolean = true) 
extends Rule[SparkPlan] {

Review comment:
   nit: `optimizeOutRepartitionShuffle`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on a change in pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


maryannxue commented on a change in pull request #33541:
URL: https://github.com/apache/spark/pull/33541#discussion_r678792942



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##
@@ -251,11 +257,12 @@ object EnsureRequirements extends Rule[SparkPlan] {
   def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
 // TODO: remove this after we create a physical operator for 
`RepartitionByExpression`.
 // SPARK-35989: AQE will change the partition number so we should retain 
the REPARTITION_BY_NUM
-// shuffle which is specified by user. And also we can not remove 
REBALANCE_PARTITIONS_BY_COL,
-// it is a special shuffle used to rebalance partitions.
-// So, here we only remove REPARTITION_BY_COL in AQE.
+// shuffle which is specified by user. And we can not remove 
REBALANCE_PARTITIONS_BY_COL either,
+// which is a special shuffle used to rebalance partitions. Here we only 
remove
+// REPARTITION_BY_COL in AQE when the given flag 
`optimizeOutRepartitionByCol` is true.
 case operator @ ShuffleExchangeExec(upper: HashPartitioning, child, 
shuffleOrigin)
-if shuffleOrigin == REPARTITION_BY_COL || 
!conf.adaptiveExecutionEnabled =>
+if (shuffleOrigin == REPARTITION_BY_COL && 
optimizeOutRepartitionByCol) ||
+  !conf.adaptiveExecutionEnabled =>

Review comment:
   And now we can allow REPARTITION_BY_NUM, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


AngersZh commented on a change in pull request #33566:
URL: https://github.com/apache/spark/pull/33566#discussion_r678792113



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
##
@@ -482,7 +482,6 @@ object ParquetWriteSupport {
   val SPARK_ROW_SCHEMA: String = "org.apache.spark.sql.parquet.row.attributes"
 
   def setSchema(schema: StructType, configuration: Configuration): Unit = {
-ParquetSchemaConverter.checkFieldNames(schema)

Review comment:
   Since do this check together, so remove this method here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #33566: [SPARK-36271][SQL] Unify V1 insert check field name before prepare writter

2021-07-28 Thread GitBox


AngersZh opened a new pull request #33566:
URL: https://github.com/apache/spark/pull/33566


   ### What changes were proposed in this pull request?
   Unify DataSource V1 insert schema check field name before prepare writer.
   
   
   ### Why are the changes needed?
   Unify code
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maryannxue commented on a change in pull request #33541: [SPARK-36315][SQL] Only skip AQEShuffleReadRule in the final stage if it breaks the distribution requirement

2021-07-28 Thread GitBox


maryannxue commented on a change in pull request #33541:
URL: https://github.com/apache/spark/pull/33541#discussion_r678789558



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##
@@ -251,11 +257,12 @@ object EnsureRequirements extends Rule[SparkPlan] {
   def apply(plan: SparkPlan): SparkPlan = plan.transformUp {
 // TODO: remove this after we create a physical operator for 
`RepartitionByExpression`.
 // SPARK-35989: AQE will change the partition number so we should retain 
the REPARTITION_BY_NUM
-// shuffle which is specified by user. And also we can not remove 
REBALANCE_PARTITIONS_BY_COL,
-// it is a special shuffle used to rebalance partitions.
-// So, here we only remove REPARTITION_BY_COL in AQE.
+// shuffle which is specified by user. And we can not remove 
REBALANCE_PARTITIONS_BY_COL either,
+// which is a special shuffle used to rebalance partitions. Here we only 
remove
+// REPARTITION_BY_COL in AQE when the given flag 
`optimizeOutRepartitionByCol` is true.
 case operator @ ShuffleExchangeExec(upper: HashPartitioning, child, 
shuffleOrigin)
-if shuffleOrigin == REPARTITION_BY_COL || 
!conf.adaptiveExecutionEnabled =>
+if (shuffleOrigin == REPARTITION_BY_COL && 
optimizeOutRepartitionByCol) ||
+  !conf.adaptiveExecutionEnabled =>

Review comment:
   We don't need `|| !conf.adaptiveExecutionEnabled` any more, right? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


viirya commented on a change in pull request #33494:
URL: https://github.com/apache/spark/pull/33494#discussion_r678789271



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
##
@@ -396,27 +396,25 @@ class SQLMetricsSuite extends SharedSparkSession with 
SQLMetricsTestUtils
   }
   }
 
-  // TODO (SPARK-36272): Reenable this after we figure out why the expected 
size doesn't
-  // match after we adjust building's memory settings.
-  ignore("SPARK-32629: ShuffledHashJoin(full outer) metrics") {
+  test("SPARK-32629: ShuffledHashJoin(full outer) metrics") {
 val uniqueLeftDf = Seq(("1", "1"), ("11", "11")).toDF("key", "value")
 val nonUniqueLeftDf = Seq(("1", "1"), ("1", "2"), ("11", 
"11")).toDF("key", "value")
 val rightDf = (1 to 10).map(i => (i.toString, i.toString)).toDF("key2", 
"value")
 Seq(
   // Test unique key on build side
-  (uniqueLeftDf, rightDf, 11, 134228048, 10, 134221824),
+  (uniqueLeftDf, rightDf, 11, 10),
   // Test non-unique key on build side
-  (nonUniqueLeftDf, rightDf, 12, 134228552, 11, 134221824)
-).foreach { case (leftDf, rightDf, fojRows, fojBuildSize, rojRows, 
rojBuildSize) =>
+  (nonUniqueLeftDf, rightDf, 12, 11)
+).foreach { case (leftDf, rightDf, fojRows, rojRows) =>
   val fojDf = leftDf.hint("shuffle_hash").join(
 rightDf, $"key" === $"key2", "full_outer")
   fojDf.collect()
   val fojPlan = fojDf.queryExecution.executedPlan.collectFirst {
 case s: ShuffledHashJoinExec => s
   }
   assert(fojPlan.isDefined, "The query plan should have shuffled hash 
join")
-  testMetricsInSparkPlanOperator(fojPlan.get,
-Map("numOutputRows" -> fojRows, "buildDataSize" -> fojBuildSize))
+  testMetricsInSparkPlanOperator(fojPlan.get, Map("numOutputRows" -> 
fojRows))
+  val fojBuildSize = fojPlan.get.metrics("buildDataSize").value
 
   // Test right outer join as well to verify build data size to be 
different
   // from full outer join. This makes sure we take extra BitSet/OpenHashSet

Review comment:
   From the comment, looks like that it originally only wants to verify two 
build data sizes are different.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


ulysses-you commented on a change in pull request #33550:
URL: https://github.com/apache/spark/pull/33550#discussion_r678787932



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
##
@@ -70,8 +70,16 @@ private[spark] class KubernetesClusterManager extends 
ExternalClusterManager wit
 // If/when feature steps are executed in client mode, they should instead 
take care of this,
 // and this code should be removed.
 if (!sc.conf.contains(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)) {
-  sc.conf.set(KUBERNETES_EXECUTOR_POD_NAME_PREFIX,
-KubernetesConf.getResourceNamePrefix(sc.conf.get("spark.app.name")))
+  val podNamePrefix = 
KubernetesConf.getResourceNamePrefix(sc.conf.get("spark.app.name"))
+  if 
(org.apache.spark.deploy.k8s.Config.isValidExecutorPodNamePrefix(podNamePrefix))
 {

Review comment:
   ah, this is because the current import has an existed 
`io.fabric8.kubernetes.client.Config` so I use full name for the Spark `Config` 
here. But after we moved into `KubernetesUtils`, I think here is no this issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #33494: [SPARK-36272][SQL][TEST] Change shuffled hash join metrics test to check relative value of build size

2021-07-28 Thread GitBox


c21 commented on pull request #33494:
URL: https://github.com/apache/spark/pull/33494#issuecomment-888762611


   Thank you @HyukjinKwon for review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


ulysses-you commented on a change in pull request #33550:
URL: https://github.com/apache/spark/pull/33550#discussion_r678787228



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
##
@@ -256,7 +256,7 @@ private[spark] object Config extends Logging {
   private val podConfValidator = 
(s"^$dns1123LabelFmt(\\.$dns1123LabelFmt)*$$").r.pattern
 
   // The possible longest executor name would be "$prefix-exec-${Int.MaxValue}"
-  private def isValidExecutorPodNamePrefix(prefix: String): Boolean = {
+  def isValidExecutorPodNamePrefix(prefix: String): Boolean = {

Review comment:
   sounds godd, moved this!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


ulysses-you commented on a change in pull request #33550:
URL: https://github.com/apache/spark/pull/33550#discussion_r678787135



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala
##
@@ -70,8 +70,16 @@ private[spark] class KubernetesClusterManager extends 
ExternalClusterManager wit
 // If/when feature steps are executed in client mode, they should instead 
take care of this,
 // and this code should be removed.
 if (!sc.conf.contains(KUBERNETES_EXECUTOR_POD_NAME_PREFIX)) {
-  sc.conf.set(KUBERNETES_EXECUTOR_POD_NAME_PREFIX,
-KubernetesConf.getResourceNamePrefix(sc.conf.get("spark.app.name")))
+  val podNamePrefix = 
KubernetesConf.getResourceNamePrefix(sc.conf.get("spark.app.name"))
+  if 
(org.apache.spark.deploy.k8s.Config.isValidExecutorPodNamePrefix(podNamePrefix))
 {
+sc.conf.set(KUBERNETES_EXECUTOR_POD_NAME_PREFIX, podNamePrefix)
+  } else {
+val shortPrefix = "spark-" + KubernetesUtils.uniqueID()
+logWarning(s"Use $shortPrefix as the executor pod's name prefix due to 
" +
+  s"spark.app.name is too long. Please set 
'${KUBERNETES_EXECUTOR_POD_NAME_PREFIX.key}' " +

Review comment:
   yes, but `KubernetesConf.getResourceNamePrefix` has already formatted 
the name so here only can be too long. Do you think it's OK now ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #33550: [SPARK-36321][K8S] Do not fail application in kubernetes if name is too long

2021-07-28 Thread GitBox


ulysses-you commented on pull request #33550:
URL: https://github.com/apache/spark/pull/33550#issuecomment-888761873


   @holdenk @dongjoon-hyun 
   
   ```
 test("Run SparkPi with a very long application name.", k8sTestTag) {
   sparkAppConf.set("spark.app.name", "long" * 40)
   runSparkPiAndVerifyCompletion()
 }
   ```
   I just found the exists test but not covered this issue since `SparkPi` 
changed the app name in code.
   ```
val spark = SparkSession
 .builder
 .appName("Spark Pi")
 .getOrCreate()
   ```
   
   I make a new classs that `SparkPiWithoutAppName` in example for k8s test. 
Please let me if you have other idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33565: [SPARK-33298][CORE][FOLLOWUP] Move `FileNameSpec` into `FileCommitProtocol` object

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33565:
URL: https://github.com/apache/spark/pull/33565#issuecomment-888708317


   **[Test build #141796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141796/testReport)**
 for PR 33565 at commit 
[`1bede64`](https://github.com/apache/spark/commit/1bede6472b2971391bd8d2d242f6106b4211edc9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #33565: [SPARK-33298][CORE][FOLLOWUP] Move `FileNameSpec` into `FileCommitProtocol` object

2021-07-28 Thread GitBox


SparkQA commented on pull request #33565:
URL: https://github.com/apache/spark/pull/33565#issuecomment-888761263


   **[Test build #141796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141796/testReport)**
 for PR 33565 at commit 
[`1bede64`](https://github.com/apache/spark/commit/1bede6472b2971391bd8d2d242f6106b4211edc9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  final case class FileNameSpec(prefix: String, suffix: String)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang edited a comment on pull request #33558: [SPARK-36326][SQL] Use Map.computeIfAbsent to simplify the process of bufferPoolsBySize init new item in HeapMemoryAllocator

2021-07-28 Thread GitBox


LuciferYang edited a comment on pull request #33558:
URL: https://github.com/apache/spark/pull/33558#issuecomment-888751784


   > There is one other instance you could change in 
AbstractCommandBuilder.getEffectiveConfig
   
   OK, let me check this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #33451: [SPARK-36206][CORE] Support shuffle data corruption diagnosis via shuffle checksum

2021-07-28 Thread GitBox


Ngone51 commented on a change in pull request #33451:
URL: https://github.com/apache/spark/pull/33451#discussion_r678781857



##
File path: 
common/network-common/src/main/java/org/apache/spark/network/corruption/Cause.java
##
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.corruption;
+
+/**
+ * The cause of shuffle data corruption.
+ *
+ * @since 3.2.0

Review comment:
   It's not a public API but it's a part of the protocol between Spark app 
and external shuffle service.
   
   I'm thinking that we might move this to `spark-network-shuffle` module 
instead since it's related to shuffle.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


SparkQA removed a comment on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888750598


   **[Test build #141798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141798/testReport)**
 for PR 33432 at commit 
[`e5977cb`](https://github.com/apache/spark/commit/e5977cb14f048a5888afdf4ad59329133667b13f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-07-28 Thread GitBox


AmplabJenkins removed a comment on pull request #33432:
URL: https://github.com/apache/spark/pull/33432#issuecomment-888754296


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141798/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >