[GitHub] [spark] maropu commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-25 Thread GitBox


maropu commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786478771


   Thanks, @sunchao! After the fix applied, I'll check the performance on my 
env, again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31637: [SPARK-34524][SQL] Simplify v2 partition commands resolution

2021-02-25 Thread GitBox


SparkQA commented on pull request #31637:
URL: https://github.com/apache/spark/pull/31637#issuecomment-786477051


   **[Test build #135498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135498/testReport)**
 for PR 31637 at commit 
[`a2c6575`](https://github.com/apache/spark/commit/a2c6575cf06660bee6245bc6a23da8713205c74c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31639: [SPARK-34528][SQL] Named explicitly field in struct of a catalog view

2021-02-25 Thread GitBox


cloud-fan commented on pull request #31639:
URL: https://github.com/apache/spark/pull/31639#issuecomment-786474381


   Since the top-level table columns can be re-ordered when resolving the view, 
I don't have a problem with doing this on nested fields. Do we have more places 
that re-order top-level columns but not nested fields? I vaguely remember that 
there are a lot of places.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-25 Thread GitBox


HyukjinKwon commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786471841


   Sounds good!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30745: [SPARK-33678][SQL] Product aggregation function

2021-02-25 Thread GitBox


SparkQA commented on pull request #30745:
URL: https://github.com/apache/spark/pull/30745#issuecomment-786468648


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40076/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786468172


   **[Test build #135497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135497/testReport)**
 for PR 31643 at commit 
[`006a5ec`](https://github.com/apache/spark/commit/006a5ec12a93ed641c3ef91d3d7adc6e6deb7553).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31659: [SPARK-34550][SQL] Skip InSet null value during push filter to Hive metastore

2021-02-25 Thread GitBox


SparkQA commented on pull request #31659:
URL: https://github.com/apache/spark/pull/31659#issuecomment-786468125


   **[Test build #135496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135496/testReport)**
 for PR 31659 at commit 
[`b3dba68`](https://github.com/apache/spark/commit/b3dba68a99374f28f7bfec227e4d9473646f48a4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-786467222


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40074/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31658:
URL: https://github.com/apache/spark/pull/31658#issuecomment-786467221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40075/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786467220


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40073/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31658:
URL: https://github.com/apache/spark/pull/31658#issuecomment-786467221


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40075/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786467220


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40073/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-786467222


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40074/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31651: [SPARK-34543][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SET LOCATION`

2021-02-25 Thread GitBox


cloud-fan commented on pull request #31651:
URL: https://github.com/apache/spark/pull/31651#issuecomment-786467182


   thanks, merging to master/3.1/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #31651: [SPARK-34543][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SET LOCATION`

2021-02-25 Thread GitBox


cloud-fan closed pull request #31651:
URL: https://github.com/apache/spark/pull/31651


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


SparkQA commented on pull request #31658:
URL: https://github.com/apache/spark/pull/31658#issuecomment-786461744


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40075/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


SparkQA commented on pull request #31658:
URL: https://github.com/apache/spark/pull/31658#issuecomment-786460013


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40075/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786459283


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40073/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


ulysses-you commented on a change in pull request #31646:
URL: https://github.com/apache/spark/pull/31646#discussion_r583425576



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##
@@ -108,6 +108,28 @@ class FiltersSuite extends SparkFunSuite with Logging with 
PlanTest {
 (a("datecol", DateType) =!= Literal(Date.valueOf("2019-01-01"))) :: Nil,
 "datecol != 2019-01-01")
 
+  filterTest("not-in int filter",

Review comment:
   Seems exists some issue about null value. Created 
[#31659](https://github.com/apache/spark/pull/31659).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you opened a new pull request #31659: [SPARK-34550][SQL] Skip InSet null value during push filter to Hive metastore

2021-02-25 Thread GitBox


ulysses-you opened a new pull request #31659:
URL: https://github.com/apache/spark/pull/31659


   
   
   ### What changes were proposed in this pull request?
   
   Skip `InSet` null value during push filter to Hive metastore.
   
   ### Why are the changes needed?
   
   If `InSet` contains a null value, we should skip it and push other values to 
metastore. To keep same behavior with `In`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Add test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-25 Thread GitBox


SparkQA commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-786453595


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40074/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786447116


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40073/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30745: [SPARK-33678][SQL] Product aggregation function

2021-02-25 Thread GitBox


SparkQA commented on pull request #30745:
URL: https://github.com/apache/spark/pull/30745#issuecomment-786443888


   **[Test build #135495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135495/testReport)**
 for PR 30745 at commit 
[`45f7e91`](https://github.com/apache/spark/commit/45f7e916ed3ec73ffbb0189f76d1b20e0f796d5f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


SparkQA commented on pull request #31658:
URL: https://github.com/apache/spark/pull/31658#issuecomment-786443595


   **[Test build #135494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135494/testReport)**
 for PR 31658 at commit 
[`a0b3e42`](https://github.com/apache/spark/commit/a0b3e421db4e6d0c2359bfdb1c6169b998efee3e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786443205


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40069/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786443204


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135486/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is no

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786443203


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40072/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786443208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135487/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao edited a comment on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-25 Thread GitBox


sunchao edited a comment on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786443278


   I applied @LuciferYang 's suggestion with TPC-DS benchmark and it fixed the 
perf regression. For instance, in the q9 above, here's what I got:
   
   without the PR | with the PR | with the PR + requestSchema fix
   |-|---|--
   5940 | 11260 | 5281
   
   I've gathered all the results in this 
[gist](https://gist.github.com/sunchao/533c1c9a5fb6adc5f7c72b1c465e974d). The 
benchmark was run with scale factor 5.
   
   @maropu @HyukjinKwon @srowen let me know what you think. I can update this 
with the fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #29542: [SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

2021-02-25 Thread GitBox


sunchao commented on pull request #29542:
URL: https://github.com/apache/spark/pull/29542#issuecomment-786443278


   I tried @LuciferYang 's suggestion with TPC-DS benchmark and it fixed the 
perf regression. For instance, in the q9 above, here's what I got:
   
   without the PR | with the PR | with the PR + requestSchema fix
   |-|---|--
   5940 | 11260 | 5281
   
   I've gathered all the results in this 
[gist](https://gist.github.com/sunchao/533c1c9a5fb6adc5f7c72b1c465e974d). The 
benchmark was run with scale factor 5.
   
   @maropu @HyukjinKwon @srowen let me know what you think. I can update this 
with the fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786443204


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135486/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786443205


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40069/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not center

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786443203


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40072/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786443208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135487/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #31651: [SPARK-34543][SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SET LOCATION`

2021-02-25 Thread GitBox


MaxGekk commented on pull request #31651:
URL: https://github.com/apache/spark/pull/31651#issuecomment-786442599


   @cloud-fan May I ask you to take a look at this small fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-25 Thread GitBox


SparkQA commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-786442441


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40074/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786410530


   **[Test build #135487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135487/testReport)**
 for PR 31646 at commit 
[`cf3ba56`](https://github.com/apache/spark/commit/cf3ba56f7086cf315aa0556315ee466c99b989ba).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786393905


   **[Test build #135486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135486/testReport)**
 for PR 31656 at commit 
[`d57c258`](https://github.com/apache/spark/commit/d57c258440a4e3119f2d4a7a151444134f84e91d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


SparkQA commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786435916


   **[Test build #135487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135487/testReport)**
 for PR 31646 at commit 
[`cf3ba56`](https://github.com/apache/spark/commit/cf3ba56f7086cf315aa0556315ee466c99b989ba).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


SparkQA commented on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786434905


   **[Test build #135486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135486/testReport)**
 for PR 31656 at commit 
[`d57c258`](https://github.com/apache/spark/commit/d57c258440a4e3119f2d4a7a151444134f84e91d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30483: [SPARK-33449][SQL] Add File Metadata cache support for Parquet and Orc

2021-02-25 Thread GitBox


SparkQA commented on pull request #30483:
URL: https://github.com/apache/spark/pull/30483#issuecomment-786434812


   **[Test build #135493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135493/testReport)**
 for PR 30483 at commit 
[`eb8fa71`](https://github.com/apache/spark/commit/eb8fa7119bebbb0943719299d9bf259237455003).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] rwpenney commented on a change in pull request #30745: [SPARK-33678][SQL] Product aggregation function

2021-02-25 Thread GitBox


rwpenney commented on a change in pull request #30745:
URL: https://github.com/apache/spark/pull/30745#discussion_r583402905



##
File path: python/pyspark/sql/functions.py
##
@@ -222,6 +222,34 @@ def sum_distinct(col):
 return _invoke_function_over_column("sum_distinct", col)
 
 
+def product(col):
+"""
+Aggregate function: returns the product of the values in a group.
+
+.. versionadded:: 3.2.0
+
+Parameters
+--
+col : str, :class:`Column`
+column containing values to be multiplied together
+
+Examples
+
+>>> df = spark.range(1, 10).toDF('x').withColumn('mod3', col('x') % 3)
+>>> prods = df.groupBy('mod3').agg(product('x').alias('product'))
+>>> prods.orderBy('mod3').show()
+++---+
+|mod3|product|
+++---+
+|   0|  162.0|
+|   1|   28.0|
+|   2|   80.0|
+++---+
+
+"""
+return _invoke_function("product", _to_java_column(col))

Review comment:
   Thanks, @ueshin for spotting this, and your other "nits". Hopefully all 
resolved now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread GitBox


SparkQA commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786432542


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40072/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786431243


   **[Test build #135492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135492/testReport)**
 for PR 31601 at commit 
[`e3dead4`](https://github.com/apache/spark/commit/e3dead46d8e49d137aab17a6c793f9913c4c5405).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yikf removed a comment on pull request #31648: [SPARK-34541][CORE] Fixed an issue where data could not be cleaned up when unregisterShuffle.

2021-02-25 Thread GitBox


yikf removed a comment on pull request #31648:
URL: https://github.com/apache/spark/pull/31648#issuecomment-786337598


   > @yikf, can you open a PR against master branch?
   
   ok, should i close the currently PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786430908


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40069/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya opened a new pull request #31658: [SPARK-34549][BUILD] Upgrade aws kinesis to 1.14.0 and java sdk 1.11.844

2021-02-25 Thread GitBox


viirya opened a new pull request #31658:
URL: https://github.com/apache/spark/pull/31658


   
   
   ### What changes were proposed in this pull request?
   
   
   This patch tries to upgrade aws kinesis and java sdk version.
   
   ### Why are the changes needed?
   
   
   Upgrade aws kinesis and java sdk to catch up minimum requirement for new 
feature like IAM role for service accounts: 
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #31642: [SPARK-33212][FOLLOWUP][test-maven][test-hadoop3.2] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile

2021-02-25 Thread GitBox


sunchao commented on pull request #31642:
URL: https://github.com/apache/spark/pull/31642#issuecomment-786429739


   Thanks @dongjoon-hyun . Let me trigger another run for Hadoop 2.7 just to be 
sure. 
   
   Jenkins CI says there're 2 failures in SQL module but I'm guessing they're 
not related (this PR doesn't touch anything in SQL), although I can't seem to 
find which two tests are these.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786426278


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40068/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


SparkQA commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786426262


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40068/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786426278


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40068/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is no

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786425902


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135491/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not cent

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786413358


   **[Test build #135491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135491/testReport)**
 for PR 31657 at commit 
[`4ca12c5`](https://github.com/apache/spark/commit/4ca12c5f0defec91c4e4214f8bb5d04a25ffaa31).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not center

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786425902


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135491/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread GitBox


SparkQA commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786425822


   **[Test build #135491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135491/testReport)**
 for PR 31657 at commit 
[`4ca12c5`](https://github.com/apache/spark/commit/4ca12c5f0defec91c4e4214f8bb5d04a25ffaa31).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mallman removed a comment on pull request #31569: [SPARK-34443][CORE] Replace symbol literals with Symbol constructor invocations to comply with Scala 2.13

2021-02-25 Thread GitBox


mallman removed a comment on pull request #31569:
URL: https://github.com/apache/spark/pull/31569#issuecomment-780028621


   I want to offer a rant on the matter that motivates this PR, but not on the 
merit of this PR. I would say this PR is the right thing to do. However, I 
think this is another example of how arrogant and insular the leadership of the 
Scala language team is. For further mind-bending context, see the [conversation 
](https://contributors.scala-lang.org/t/proposal-to-deprecate-and-remove-symbol-literals/2953)
 on the proposal to remove the Symbol literal from Scala 2.13, which includes 
such gems of received wisdom as
   
   > Symbols are used in _some_ existing Scala code, but are not used 
pervasively.
   
   Apparently this PR does not count as pervasive usage. And apparently the 
symbol literal syntax in Spark SQL user code is not used pervasively. They know!
   
   IMO, Scala is (sadly) not a user-oriented programming language (anymore). 
It's an insular academic exercise. I pity the programmers who will be saddled 
with Scala 3.
   
   I do not mean to judge the value of this PR. Like I said, this rant is off 
topic. Scala 2.13 is what it and is. The ship has sailed. Under the 
circumstances of reality, this PR is the right thing to do. It's just sad it 
has to happen.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread GitBox


SparkQA commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786423748


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40072/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


sarutak commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-78649


   Newly added `RemoveNoopUnionSuite` seems to use symbol literal, which causes 
the build failure.
   I'll fix it too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786421444


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135490/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786421442


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40066/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31654: [SPARK-34547][SQL] Only use metadata columns for resolution as last resort

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31654:
URL: https://github.com/apache/spark/pull/31654#issuecomment-786421438


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786421439


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40067/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31642: [SPARK-33212][FOLLOWUP][test-maven][test-hadoop3.2] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31642:
URL: https://github.com/apache/spark/pull/31642#issuecomment-786421443


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135485/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786421441


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135488/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31641: [SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31641:
URL: https://github.com/apache/spark/pull/31641#issuecomment-786421446


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40070/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786421442


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40066/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31641: [SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31641:
URL: https://github.com/apache/spark/pull/31641#issuecomment-786421446


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40070/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31654: [SPARK-34547][SQL] Only use metadata columns for resolution as last resort

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31654:
URL: https://github.com/apache/spark/pull/31654#issuecomment-786421438


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31642: [SPARK-33212][FOLLOWUP][test-maven][test-hadoop3.2] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31642:
URL: https://github.com/apache/spark/pull/31642#issuecomment-786421443


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135485/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786421439


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40067/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786421444


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135490/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786421441


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135488/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786421087


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40069/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31654: [SPARK-34547][SQL] Only use metadata columns for resolution as last resort

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31654:
URL: https://github.com/apache/spark/pull/31654#issuecomment-786332664


   **[Test build #135482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135482/testReport)**
 for PR 31654 at commit 
[`1c5ab03`](https://github.com/apache/spark/commit/1c5ab03470ba1e99826c3238c0c6ae6fafef22ce).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31641: [SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE

2021-02-25 Thread GitBox


SparkQA commented on pull request #31641:
URL: https://github.com/apache/spark/pull/31641#issuecomment-786419889


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40070/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31654: [SPARK-34547][SQL] Only use metadata columns for resolution as last resort

2021-02-25 Thread GitBox


SparkQA commented on pull request #31654:
URL: https://github.com/apache/spark/pull/31654#issuecomment-786419848


   **[Test build #135482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135482/testReport)**
 for PR 31654 at commit 
[`1c5ab03`](https://github.com/apache/spark/commit/1c5ab03470ba1e99826c3238c0c6ae6fafef22ce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31641: [SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE

2021-02-25 Thread GitBox


SparkQA commented on pull request #31641:
URL: https://github.com/apache/spark/pull/31641#issuecomment-786418306


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40070/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31642: [SPARK-33212][FOLLOWUP][test-maven][test-hadoop3.2] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31642:
URL: https://github.com/apache/spark/pull/31642#issuecomment-786355097


   **[Test build #135485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135485/testReport)**
 for PR 31642 at commit 
[`e2d32aa`](https://github.com/apache/spark/commit/e2d32aa88ef6db801726d6385388696ce2fcac7f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


SparkQA commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786416512


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40068/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31642: [SPARK-33212][FOLLOWUP][test-maven][test-hadoop3.2] Add hadoop-yarn-server-web-proxy for Hadoop 3.x profile

2021-02-25 Thread GitBox


SparkQA commented on pull request #31642:
URL: https://github.com/apache/spark/pull/31642#issuecomment-786416280


   **[Test build #135485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135485/testReport)**
 for PR 31642 at commit 
[`e2d32aa`](https://github.com/apache/spark/commit/e2d32aa88ef6db801726d6385388696ce2fcac7f).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


cloud-fan commented on a change in pull request #31643:
URL: https://github.com/apache/spark/pull/31643#discussion_r583387151



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -136,11 +142,12 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 mapIdToReduceIds.get(mapId).add(Integer.parseInt(blockIdParts[4]));
   }
 }
-long[] mapIds = Longs.toArray(mapIdToReduceIds.keySet());
+long[] mapIds = Longs.toArray(orderedMapId);

Review comment:
   This assumes the block ids are ordered by map id and reduce id. If the 
block ids are `shuffle_1_1_1, shuffle_1_2_1, shuffle_1_1_2`, then I don't think 
`FetchShuffleBlocks` can retain the order anymore.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


SparkQA commented on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786414497


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40066/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786410718


   **[Test build #135488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135488/testReport)**
 for PR 31643 at commit 
[`384f43c`](https://github.com/apache/spark/commit/384f43c32bfc54b93035ad48b1ae8b297101aec7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786414359


   **[Test build #135488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135488/testReport)**
 for PR 31643 at commit 
[`384f43c`](https://github.com/apache/spark/commit/384f43c32bfc54b93035ad48b1ae8b297101aec7).
* This patch **fails Java style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


cloud-fan commented on a change in pull request #31643:
URL: https://github.com/apache/spark/pull/31643#discussion_r583386370



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -117,17 +117,23 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 boolean batchFetchEnabled = firstBlock.length == 5;
 
 HashMap> mapIdToReduceIds = new HashMap<>();
+ArrayList orderedMapId = new ArrayList<>();
 for (String blockId : blockIds) {
   String[] blockIdParts = splitBlockId(blockId);
   if (Integer.parseInt(blockIdParts[1]) != shuffleId) {
 throw new IllegalArgumentException("Expected shuffleId=" + shuffleId +
   ", got:" + blockId);
   }
   long mapId = Long.parseLong(blockIdParts[2]);
+  assert(orderedMapId.isEmpty() || mapId >= 
orderedMapId.get(orderedMapId.size() - 1));

Review comment:
   This bug reveals that the block ids order is very important. I think it 
makes sense to add asserts to guarantee it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered

2021-02-25 Thread GitBox


SparkQA commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786413358


   **[Test build #135491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135491/testReport)**
 for PR 31657 at commit 
[`4ca12c5`](https://github.com/apache/spark/commit/4ca12c5f0defec91c4e4214f8bb5d04a25ffaa31).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA removed a comment on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786410268


   **[Test build #135490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135490/testReport)**
 for PR 31601 at commit 
[`a9e3490`](https://github.com/apache/spark/commit/a9e34901612042997441a767f2501df2cd7b3729).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on a change in pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


otterc commented on a change in pull request #31643:
URL: https://github.com/apache/spark/pull/31643#discussion_r583382485



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -117,17 +117,23 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 boolean batchFetchEnabled = firstBlock.length == 5;
 
 HashMap> mapIdToReduceIds = new HashMap<>();
+ArrayList orderedMapId = new ArrayList<>();
 for (String blockId : blockIds) {
   String[] blockIdParts = splitBlockId(blockId);
   if (Integer.parseInt(blockIdParts[1]) != shuffleId) {
 throw new IllegalArgumentException("Expected shuffleId=" + shuffleId +
   ", got:" + blockId);
   }
   long mapId = Long.parseLong(blockIdParts[2]);
+  assert(orderedMapId.isEmpty() || mapId >= 
orderedMapId.get(orderedMapId.size() - 1));

Review comment:
   Why is this needed? `mapId >= orderedMapId.get(orderedMapId.size() - 1)` 
is validating that the `blockIds` are in increasing order of mapIds but this 
validation didn't exist earlier so why is this being done now in reference to 
this bug?

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -117,17 +117,23 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 boolean batchFetchEnabled = firstBlock.length == 5;
 
 HashMap> mapIdToReduceIds = new HashMap<>();
+ArrayList orderedMapId = new ArrayList<>();
 for (String blockId : blockIds) {
   String[] blockIdParts = splitBlockId(blockId);
   if (Integer.parseInt(blockIdParts[1]) != shuffleId) {
 throw new IllegalArgumentException("Expected shuffleId=" + shuffleId +
   ", got:" + blockId);
   }
   long mapId = Long.parseLong(blockIdParts[2]);
+  assert(orderedMapId.isEmpty() || mapId >= 
orderedMapId.get(orderedMapId.size() - 1));
   if (!mapIdToReduceIds.containsKey(mapId)) {
 mapIdToReduceIds.put(mapId, new ArrayList<>());
+orderedMapId.add(mapId);
   }
-  mapIdToReduceIds.get(mapId).add(Integer.parseInt(blockIdParts[3]));
+  ArrayList reduceIdsByMapId = mapIdToReduceIds.get(mapId);
+  int reduceId = Integer.parseInt(blockIdParts[3]);
+  assert(reduceIdsByMapId.isEmpty() || reduceId > 
reduceIdsByMapId.get(reduceIdsByMapId.size() - 1));

Review comment:
   Same here, why is this validation `reduceId > 
reduceIdsByMapId.get(reduceIdsByMapId.size() - 1))` needed now in reference to 
this bug?

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -136,11 +142,12 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 mapIdToReduceIds.get(mapId).add(Integer.parseInt(blockIdParts[4]));
   }
 }
-long[] mapIds = Longs.toArray(mapIdToReduceIds.keySet());
+long[] mapIds = Longs.toArray(orderedMapId);

Review comment:
   If we use a LinkedHashMap for `mapIdToReduceIds`, then 
`mapIdToReduceIds.keySet` will return the mapIds in the same order as they were 
inserted. So the order will be same as blockIds.

##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
##
@@ -117,17 +117,23 @@ private FetchShuffleBlocks createFetchShuffleBlocksMsg(
 boolean batchFetchEnabled = firstBlock.length == 5;
 
 HashMap> mapIdToReduceIds = new HashMap<>();
+ArrayList orderedMapId = new ArrayList<>();

Review comment:
   Why can't we just change `mapIdToReduceIds` to a LinkedHashMap?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786412028


   **[Test build #135490 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135490/testReport)**
 for PR 31601 at commit 
[`a9e3490`](https://github.com/apache/spark/commit/a9e34901612042997441a767f2501df2cd7b3729).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31643: [SPARK-34534] Fix blockIds order when use FetchShuffleBlocks to fetch blocks

2021-02-25 Thread GitBox


SparkQA commented on pull request #31643:
URL: https://github.com/apache/spark/pull/31643#issuecomment-786410718


   **[Test build #135488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135488/testReport)**
 for PR 31643 at commit 
[`384f43c`](https://github.com/apache/spark/commit/384f43c32bfc54b93035ad48b1ae8b297101aec7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31646: [SPARK-34538][SQL] Hive Metastore support filter by not-in

2021-02-25 Thread GitBox


SparkQA commented on pull request #31646:
URL: https://github.com/apache/spark/pull/31646#issuecomment-786410530


   **[Test build #135487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135487/testReport)**
 for PR 31646 at commit 
[`cf3ba56`](https://github.com/apache/spark/commit/cf3ba56f7086cf315aa0556315ee466c99b989ba).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31641: [SPARK-34533][SQL] Eliminate LEFT ANTI join to empty relation in AQE

2021-02-25 Thread GitBox


SparkQA commented on pull request #31641:
URL: https://github.com/apache/spark/pull/31641#issuecomment-786410309


   **[Test build #135489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135489/testReport)**
 for PR 31641 at commit 
[`934306e`](https://github.com/apache/spark/commit/934306e16d961a6d8373252082e0f0417181b7f4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786410268


   **[Test build #135490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135490/testReport)**
 for PR 31601 at commit 
[`a9e3490`](https://github.com/apache/spark/commit/a9e34901612042997441a767f2501df2cd7b3729).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins removed a comment on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786407443


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40071/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


SparkQA commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786407434


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40071/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31601: [SPARK-34484][SQL] Rename `map` to `mapAttr` in Catalyst DSL

2021-02-25 Thread GitBox


AmplabJenkins commented on pull request #31601:
URL: https://github.com/apache/spark/pull/31601#issuecomment-786407443


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40071/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centere

2021-02-25 Thread GitBox


zhengruifeng commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786406303


   log for third commit (reset reg, with OWLQN):
   ```
   info] LogisticRegressionSuite:
   featuresMean: [0.4999142959117828,1.4847274177074965]
   featuresStd: [0.28501348037270735,0.28375633081273305]
   optimizer: breeze.optimize.OWLQN@4aeb7783
   initialCoefWithInterceptMatrix 0.0  0.0  -3.548107045716773  
   gradient: [-6.57041694534459E-4,-4.370107454830405E-4,3.625153510711243E-16]
   solution: 0.0,0.0,-3.548107045716773
   gradient: [0.012576822367455517,0.035095048230420756,0.023208452447024495]
   gradient: [3.084092735973036E-4,0.0022054288144351284,0.0017366857118698293]
   gradient: 
[-5.633768379895444E-4,-1.8017201591232754E-4,1.6890109753914793E-4]
   solution: 0.004163191715591638,0.0027690062281072866,-3.548107045716775
   gradient: [-3.0137744451939284E-4,4.6596334357779936E-4,5.932164205739878E-4]
   gradient: [-4.3301294997634464E-4,1.4148275429491427E-4,3.801461048864519E-4]
   solution: 0.015209328934070385,0.006399632492536262,-3.551333586970503
   gradient: [-2.5232553854103156E-4,3.3104813347862455E-4,5.080493652711567E-4]
   gradient: [-3.43183432967191E-4,2.3510527964409986E-4,4.4331650510299304E-4]
   solution: 0.0411521259530353,0.006399632492536262,-3.5620682815764617
   gradient: [2.0402814517730128E-4,2.9026346660974343E-4,4.8164982655218575E-4]
   gradient: [-7.696168852660265E-5,2.409139731872547E-4,4.478295475935821E-4]
   solution: 0.15892934358193694,0.006399632492536262,-3.6216855027869053
   gradient: 
[-0.0011178327557503794,-0.0035817990751566687,-0.002125466615998843]
   gradient: 
[-6.083511834621587E-4,-0.0017215714267202792,-8.732287500123202E-4]
   solution: 0.22685720454125335,0.006399632492536262,-3.7056060336306187
   gradient: [2.6948183426323067E-4,3.44424518557177E-4,5.053861318743803E-4]
   gradient: 
[-1.7635999122087975E-4,-7.031841863072907E-4,-1.9361190989382608E-4]
   solution: 0.2636653026531035,0.01069320297162197,-3.7056060336306187
   gradient: [3.789761887179946E-4,6.315502822382426E-4,6.942924046802138E-4]
   gradient: [9.861877496135642E-5,-4.164402288511879E-5,2.4649722592213347E-4]
   solution: 0.284456628258012,0.01429018235617051,-3.7056060336306187
   gradient: 
[-1.4993593326206948E-4,-7.400049087005653E-4,-2.2988583247743574E-4]
   solution: 0.284456628258012,0.01863675577190048,-3.7295827038936937
   gradient: [1.9305520555229646E-4,1.1584243552588863E-4,3.365146579866143E-4]
   solution: 0.3058510179507513,0.025107665740575463,-3.7295827038936937
   gradient: 
[-1.6471771016940062E-4,-9.004614442834267E-4,-3.4714980064127457E-4]
   gradient: [1.3110097659120526E-5,-3.953247283360091E-4,-7.346055783353328E-6]
   solution: 0.3058510179507513,0.025107665740575463,-3.742159551373379
   gradient: [2.4379507023459962E-4,2.842348371790601E-4,4.336387972581594E-4]
   solution: 0.3058510179507513,0.03593669008006364,-3.742159551373379
   gradient: 
[-1.5326073985222512E-4,-8.442931308696122E-4,-3.250994518551714E-4]
   solution: 0.3058510179507513,0.03593669008006364,-3.7700181112896485
   gradient: [3.395917591196E-4,5.655567726690022E-4,5.920554720004718E-4]
   gradient: [9.11098938545347E-5,-1.453001447741117E-4,1.2973103833574225E-4]
   solution: 0.30876594775281047,0.046195062467356426,-3.7700181112896485
   gradient: [8.730351966265057E-5,-1.0286694249249848E-4,1.2284216226290406E-4]
   solution: 0.30876594775281047,0.07020496857637049,-3.806022634762269
   gradient: [-5.25840986869693E-5,6.887281546790356E-5,4.653761082377619E-5]
   gradient: [1.4773863896052116E-5,-2.4227076571252262E-5,7.974736439436095E-5]
   solution: 0.2863590488008956,0.13528734572703915,-3.893023010157778
   gradient: 
[-0.0015174585749633078,-0.004146776899760863,-0.002853780226151062]
   gradient: 
[-7.741677880238251E-4,-0.0021437506348480355,-0.0014308590534517585]
   gradient: 
[-3.8557769797802397E-4,-0.0010989792940646894,-6.868537920891279E-4]
   gradient: 
[-1.8689443031625006E-4,-5.654056098773977E-4,-3.064209073149233E-4]
   solution: 0.2863590488008956,0.15198455342521094,-3.9322559605462195
   gradient: [7.81763204338149E-4,0.002316591435880098,0.001551059285341711]
   gradient: [2.8956459424240677E-4,8.513474468299444E-4,6.071725896724885E-4]
   gradient: [4.9398252743650875E-5,1.3700714868214556E-4,1.466485549531671E-4]
   solution: 0.2863590488008956,0.1631516524737614,-3.9322559605462195
   gradient: 
[-7.324976101955782E-4,-0.0020893127347362435,-0.001340509043050153]
   gradient: [-3.467366816718053E-4,-9.908374681988335E-4,-6.067444097064593E-4]
   gradient: 
[-1.4998213014049956E-4,-4.306319381195825E-4,-2.325319845275446E-4]
   gradient: 
[-5.062220696344268E-5,-1.4774734353872337E-4,-4.356657912722173E-5]
   solution: 0.28600841316728876,0.1631516524737614,-3.9390576077951227
   gradient: [6.876274051966575E-4,0.002049460304633041,0.0013721785786130543]
   gradient: 

[GitHub] [spark] zhengruifeng edited a comment on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not

2021-02-25 Thread GitBox


zhengruifeng edited a comment on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786405441


   logs for first commit (LBFGS without reg):
   ```
   [info] LogisticRegressionSuite:
   featuresMean: [0.4999142959117828,1.4847274177074965]
   featuresStd: [0.28501348037270735,0.28375633081273305]
   optimizer: breeze.optimize.LBFGS@e0d13c1
   initialCoefWithInterceptMatrix 0.0  0.0  -3.548107045716773  
   gradient: [-0.002305300414829984,-0.001540091613926333,3.625153510711243E-16]
   solution: 0.0,0.0,-3.548107045716773
   gradient: [1.270197141453041,3.420900875935085,0.6340310033871515]
   gradient: [0.02712239947522615,0.07838873390702962,0.014843589590906155]
   gradient: [5.742212568839818E-5,0.004963605142872151,0.0012130388688036885]
   gradient: [-0.0011314004879724678,0.0016935454656592792,6.032615470892342E-4]
   solution: 0.004180284984790963,0.002792704069058841,-3.5481070457167734
   gradient: [-0.0010728634715018125,0.00160772260209591,5.917007493574007E-4]
   gradient: [-0.0010434636154917003,0.0015652858941190642,5.860092709095692E-4]
   gradient: [-9.99199410343589E-4,0.0015022235418333052,5.775831071855921E-4]
   solution: 0.010568018954399284,7.631146700935471E-4,-3.549657937366679
   gradient: [1.2390525210071498E-4,1.4717833756368236E-4,3.98479395559E-4]
   solution: 0.06279624904315896,-0.014248302675022488,-3.571101773719935
   gradient: [9.7770168512076E-5,-1.373828594793572E-4,3.400255628368619E-4]
   gradient: [8.481446986403304E-5,-2.790060362521768E-4,3.1093063707884655E-4]
   solution: 0.06716764247861332,-0.013115419667116011,-3.588130852021863
   gradient: 
[-9.968656875832441E-5,-0.0021390619743561936,-1.5963325443471433E-4]
   solution: 0.09216063924943523,0.009066818457207741,-3.767127156279349
   gradient: 
[-1.1037293015684862E-4,-0.001941870473764838,-2.5262563678862614E-4]
   solution: 0.09825523866837622,0.03454325895969772,-3.915627606930046
   gradient: [-2.0482663038072557E-5,-4.43210469688911E-4,-7.738226665332456E-5]
   solution: 0.08968371353490419,0.055787393265421564,-4.005407480573733
   gradient: 
[-7.5354046131570336E-6,-5.834078775921241E-5,-1.3492011139001316E-5]
   solution: 0.08582579292847434,0.05750880834200776,-4.005067048017333
   gradient: 
[-9.990118197823818E-7,-1.9016389487075713E-6,-8.202512294008946E-7]
   solution: 0.08520584612951042,0.057117003687867866,-4.001392447236977
   gradient: 
[-2.1269163009208114E-8,1.6634162741361534E-7,1.4415507203052513E-8]
   solution: 0.08518495364769638,0.05703053481014659,-4.000866357569121
   gradient: 
[-1.0143803239515137E-10,6.222596734389185E-9,1.0190080001226675E-9]
   solution: 0.08518662793392717,0.05702725297223899,-4.000852573726639
   Just the blr data
   Coefficients: [0.29888631170192387,0.20097261903867272]
   Intercept: -4.000852573726639
   objectives: 
0.12762747240529596,0.1276204770445784,0.127609513705,0.12756395823655467,0.12755829996951326,0.12751789363970717,0.12749589534392758,0.1274859125261507,0.12748551910072972,0.12748550724997348,0.12748550712365025,0.127485507123459
   featuresMean: [0.4999142959117828,1.4847274177074965,0.989976158129]
   featuresStd: [0.28501348037270735,0.28375633081273305,0.0302215257344]
   optimizer: breeze.optimize.LBFGS@225c477f
   initialCoefWithInterceptMatrix 0.0  0.0  0.0  -3.548107045716773  
   gradient: 
[-0.002305300414829984,-0.001540091613926333,-4.444349528265E-5,3.625153510711243E-16]
   solution: 0.0,0.0,0.0,-3.548107045716773
   gradient: 
[1.4043881959417224,3.884356710917848,23.954743028917907,0.7257390266235928]
   gradient: 
[0.031201599585874356,0.09016374509441172,0.5634823144261903,0.017072723505223292]
   gradient: 
[3.232595426584435E-4,0.005753607491472995,0.044972059597789826,0.0013638840438085728]
   gradient: 
[-0.0012590176292179422,0.001365692563291016,0.01789481499206562,5.435186208776085E-4]
   solution: 
0.003361311019887606,0.0022455758391510356,6.512751349352555E-5,-3.5481070457167734
   gradient: 
[-0.0012058526178998585,0.0013107514665489008,0.017161368441663608,5.216994435121182E-4]
   gradient: 
[-0.0011790396852133288,0.001284089723365383,0.01679996576337007,5.109510894332127E-4]
   gradient: 
[-0.0011385318209425725,0.0012451072263457426,0.016264501895099932,4.950297871192996E-4]
   gradient: 
[-0.0010771199438327206,0.0011889041731884607,0.015476221524179956,4.715998596198084E-4]
   solution: 
0.014585851627457428,0.009492561757612174,-0.0017615363470460436,-3.5481690479854455
   gradient: 
[1.538887596861738E-4,5.298765454822907E-4,0.003464779223103357,1.160666951868568E-4]
   solution: 
0.08327888456493171,0.05384078236770347,-0.012962432443477992,-3.548558824766501
   gradient: 
[-3.2589948482520296E-6,1.0213435092852024E-6,1.9811253571497672E-5,1.1886786287249719E-5]
   solution: 
0.08453485443429141,0.05463738823981839,-0.013274937663181247,-3.548583974676902
   gradient: 

[GitHub] [spark] SparkQA commented on pull request #31656: [SPARK-34548][SQL] Remove unnecessary children from Union under Distince and Deduplicate

2021-02-25 Thread GitBox


SparkQA commented on pull request #31656:
URL: https://github.com/apache/spark/pull/31656#issuecomment-786405847


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40066/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #31657: [Spark-34448][3.0][DO NOT MERGE] Binary logistic regression incorrectly computes the intercept and coefficients when data is not centere

2021-02-25 Thread GitBox


zhengruifeng commented on pull request #31657:
URL: https://github.com/apache/spark/pull/31657#issuecomment-786405671


   log for second commit (LBFGS with unit std vec):
   ```
   [info] LogisticRegressionSuite:
   featuresMean: [0.4999142959117828,1.4847274177074965]
   featuresStd: [0.28501348037270735,0.28375633081273305]
   optimizer: breeze.optimize.LBFGS@624d4b69
   initialCoefWithInterceptMatrix 0.0  0.0  -3.548107045716773  
   gradient: [-6.57041694534459E-4,-4.370107454830405E-4,3.625153510711243E-16]
   solution: 0.0,0.0,-3.548107045716773
   gradient: [0.037486046426067406,0.09987861275241,0.06507239621227094]
   gradient: [0.0013405594990216123,0.005019022157632213,0.0035834830790336973]
   gradient: [-3.210682348387613E-4,4.831610451234E-4,6.054237842405305E-4]
   solution: 0.014805700695282229,0.009847549024766982,-3.548107045716781
   gradient: [-3.0439342146747623E-4,4.583018612212477E-4,5.881840414461824E-4]
   gradient: [-2.960154039993151E-4,4.457188384476876E-4,5.796684709803373E-4]
   gradient: [-2.8339754836559654E-4,4.270376956658465E-4,5.670253289058287E-4]
   gradient: [-2.643562412798169E-4,3.9945054731613526E-4,5.483532130856691E-4]
   solution: 0.05271369323095594,0.010180670435029071,-3.5697082151460187
   gradient: [1.1037417136010827E-4,-8.7704283467311E-6,2.1570716372172727E-4]
   solution: 0.2971472561219698,0.05005170605291687,-3.76653674223476
   gradient: [8.037837845700428E-5,-1.0875323550967706E-4,8.098851723631143E-5]
   solution: 0.3156740085496769,0.0960039248572748,-3.849640027204097
   gradient: [1.2475582765711754E-5,-6.506116783877217E-5,-3.145661083981624E-5]
   solution: 0.3120252467428481,0.19284879886445705,-3.9967027342507913
   gradient: [-9.952686706791347E-8,-3.7266622659746237E-6,-2.48899982200976E-6]
   solution: 0.29943354035110287,0.20097832760243223,-4.0012388018408505
   gradient: 
[-3.116152416004958E-8,-9.911977248879821E-8,-1.0046797058538637E-7]
   solution: 0.29889640759752595,0.2009961607263976,-4.000896894654767
   gradient: [-2.5794348819250245E-9,-2.807803823943189E-9,-3.89839012736548E-9]
   solution: 0.2988863273085957,0.2009739013083474,-4.000854686053599
   Just the blr data
   Coefficients: [0.2988863273085957,0.2009739013083474]
   Intercept: -4.000854686053599
   objectives: 
0.12762747240529596,0.1276204351259814,0.12759702629942907,0.12751125093754292,0.1274979961754713,0.12748578824275672,0.12748550756610177,0.12748550712444337,0.12748550712354415
   featuresMean: [0.4999142959117828,1.4847274177074965,0.989976158129]
   featuresStd: [0.28501348037270735,0.28375633081273305,0.0302215257344]
   optimizer: breeze.optimize.LBFGS@3af3aac9
   initialCoefWithInterceptMatrix 0.0  0.0  0.0  -3.548107045716773  
   gradient: 
[-6.57041694534459E-4,-4.370107454830405E-4,-1.343191621544E-6,3.625153510711243E-16]
   solution: 0.0,0.0,0.0,-3.548107045716773
   gradient: 
[0.0375640994399445,0.10009275463877076,0.06456907958057753,0.06521326011205325]
   gradient: 
[0.0013431585739733982,0.005026663543417702,0.0035517611882163154,0.003588615786464317]
   gradient: 
[-3.215209510861858E-4,4.826334848026E-4,5.973629602699559E-4,6.046895719797653E-4]
   solution: 
0.01476791598861038,0.009822417708188631,3.0118350635448057E-5,-3.548107045716781
   gradient: 
[-3.0478851559377266E-4,4.57239850921697E-4,5.772211350909403E-4,5.84407120749076E-4]
   gradient: 
[-2.963752657802199E-4,4.4472367632075234E-4,5.672715430191644E-4,5.743885213481228E-4]
   gradient: 
[-2.836965233322361E-4,4.261734557453568E-4,5.524985845169208E-4,5.595136829010668E-4]
   gradient: 
[-2.6454577178281114E-4,3.988515626222023E-4,5.306793422089565E-4,5.375452880291062E-4]
   solution: 
0.05511550800937802,0.01710393453944099,-0.016677593371711107,-3.5650843362532716
   gradient: 
[1.122620493222393E-4,1.69351977375603E-5,1.5461916575593314E-4,1.5985737926524723E-4]
   solution: 
0.3117402674013613,0.09950259002347643,-0.15018496930807412,-3.701229449303286
   gradient: 
[5.7819864364046636E-5,-7.931161977957402E-5,3.052582029483142E-5,3.539355570733648E-5]
   solution: 
0.3168790579367132,0.1403905504150381,-0.183731576240573,-3.7363532715032655
   gradient: 
[1.4743302769282955E-6,-4.000320616425612E-5,-2.574634225776826E-5,-2.0316982900646382E-5]
   solution: 
0.3048717507346099,0.19644631102165389,-0.22246194509495965,-3.168113808957
   gradient: 
[-2.5616048474991257E-7,-2.372110642287623E-6,-4.559378512450379E-6,1.0723439402145586E-6]
   solution: 
0.2990006716083257,0.19896032820137124,-0.22049520055803037,-3.779576550575493
   gradient: 
[-1.072006751766814E-7,3.9679441851170695E-6,-5.737911080290558E-7,5.084563288675078E-6]
   solution: 
0.29811334334195977,0.19910403616747102,-0.21973735720023457,-3.7799308768290003
   gradient: 
[3.618859048673073E-8,8.206222113082688E-6,2.13322674141024E-6,7.79891010998962E-6]
   solution: 
0.297531502541459,0.19917841684840484,-0.21879905742916328,-3.780567305589414
   gradient: 

  1   2   3   4   5   6   >