date:20200428

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620993313


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122037/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620993307


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620985246


   **[Test build #122037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)**
 for PR 28365 at commit 
[`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620993291


   **[Test build #122037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)**
 for PR 28365 at commit 
[`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620993307







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] igreenfield removed a comment on pull request #26624: [SPARK-8981][core] Add MDC support in Executor

2020-04-28 Thread GitBox



igreenfield removed a comment on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-620460159







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath

2020-04-28 Thread GitBox



dongjoon-hyun commented on pull request #28376:
URL: https://github.com/apache/spark/pull/28376#issuecomment-620992909


   cc @holdenk since this seems to be targeting 2.4.6.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620992239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r417070637



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3425,6 +3425,28 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   assert(SQLConf.get.getConf(SQLConf.CODEGEN_FALLBACK) === true)
 }
   }
+
+  test("SPARK-31594: Do not display the seed of rand/randn with no argument in 
output schema") {
+def checkIfSeedExistsInExplain(df: DataFrame): Unit = {
+  val output = new java.io.ByteArrayOutputStream()
+  Console.withOut(output) {
+df.explain()
+  }
+  output.toString.matches("""randn?\(-?[0-9]+\)""")
+}
+val df1 = sql("SELECT rand()")
+assert(df1.schema.head.name === "rand()")
+checkIfSeedExistsInExplain(df1)

Review comment:
   If we add `assert` at line 3435, this test will fail.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620992239







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620991893


   **[Test build #122039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122039/testReport)**
 for PR 26141 at commit 
[`baa9f06`](https://github.com/apache/spark/commit/baa9f0602e0d5fed526cef59eb7936b676d869a6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM

2020-04-28 Thread GitBox



zhengruifeng commented on pull request #28229:
URL: https://github.com/apache/spark/pull/28229#issuecomment-620990814


   @xwu99 My previous works include:
   LinearSVC:  https://github.com/apache/spark/pull/27360
   LogisticRegression: https://github.com/apache/spark/pull/27374
   LinearRegression: https://github.com/apache/spark/pull/27396
   GaussianMixture: https://github.com/apache/spark/pull/27473
   KMeans: 
https://github.com/apache/spark/compare/master...zhengruifeng:blockify_km?expand=1,
 not send
   
   I'm reworking on 
`LinearSVC`/`LogisticRegression`/`LinearRegression`/`GaussianMixture`. For 
KMeans I am glad you can take it over.
   
   I just recreate a new [PR](https://github.com/apache/spark/pull/28349) for 
LinearSVC, the main idea is to use expert param `blockSize` to choose the path. 
The original path will be choosen by default to avoid performance regression on 
sparse datasets.
   
   If nobody object, I will merge it, and then other three impls (since they 
depend on the first one, I do not recreate PRs right now) 
`LogisticRegression`/`LinearRegression`/`GaussianMixture`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28394:
URL: https://github.com/apache/spark/pull/28394#issuecomment-620990107







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



dongjoon-hyun commented on a change in pull request #28392:
URL: https://github.com/apache/spark/pull/28392#discussion_r417068674



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
##
@@ -3425,6 +3425,28 @@ class SQLQuerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   assert(SQLConf.get.getConf(SQLConf.CODEGEN_FALLBACK) === true)
 }
   }
+
+  test("SPARK-31594: Do not display the seed of rand/randn with no argument in 
output schema") {
+def checkIfSeedExistsInExplain(df: DataFrame): Unit = {
+  val output = new java.io.ByteArrayOutputStream()
+  Console.withOut(output) {
+df.explain()
+  }
+  output.toString.matches("""randn?\(-?[0-9]+\)""")

Review comment:
   Did you want `assert(...)`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28394:
URL: https://github.com/apache/spark/pull/28394#issuecomment-620990107







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #28394:
URL: https://github.com/apache/spark/pull/28394#issuecomment-620915241


   **[Test build #122020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122020/testReport)**
 for PR 28394 at commit 
[`b7d14be`](https://github.com/apache/spark/commit/b7d14be909655e320d2e93a8a48249f56e3286ce).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc

2020-04-28 Thread GitBox



SparkQA commented on pull request #28394:
URL: https://github.com/apache/spark/pull/28394#issuecomment-620989321


   **[Test build #122020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122020/testReport)**
 for PR 28394 at commit 
[`b7d14be`](https://github.com/apache/spark/commit/b7d14be909655e320d2e93a8a48249f56e3286ce).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][core] Add MDC support in Executor

2020-04-28 Thread GitBox



igreenfield commented on pull request #26624:
URL: https://github.com/apache/spark/pull/26624#issuecomment-620989210


   @ngone51 can you help understand what went wrong in the build? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620988621







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620988621







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



SparkQA commented on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620988161


   **[Test build #122021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122021/testReport)**
 for PR 28392 at commit 
[`4b1f3f2`](https://github.com/apache/spark/commit/4b1f3f2a34ddac22d940aac02a592b12a27ccb60).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class Rand(child: Expression, hideSeed: Boolean = false)`
 * `case class Randn(child: Expression, hideSeed: Boolean = false)`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #28392:
URL: https://github.com/apache/spark/pull/28392#issuecomment-620917411


   **[Test build #122021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122021/testReport)**
 for PR 28392 at commit 
[`4b1f3f2`](https://github.com/apache/spark/commit/4b1f3f2a34ddac22d940aac02a592b12a27ccb60).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985704


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122038/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985496







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985245


   **[Test build #122038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)**
 for PR 26141 at commit 
[`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985698


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985698







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985690


   **[Test build #122038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)**
 for PR 26141 at commit 
[`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985496







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620985245


   **[Test build #122038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)**
 for PR 26141 at commit 
[`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620985246


   **[Test build #122037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)**
 for PR 28365 at commit 
[`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620983922







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620983922







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982982


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122033/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620979105


   **[Test build #122033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)**
 for PR 26141 at commit 
[`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982972


   **[Test build #122033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)**
 for PR 26141 at commit 
[`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8).
* This patch **fails build dependency tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982890


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982896







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982978







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982094


   **[Test build #122036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)**
 for PR 26141 at commit 
[`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982881


   **[Test build #122036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)**
 for PR 26141 at commit 
[`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982890







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982367







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982367







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620982094


   **[Test build #122036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)**
 for PR 26141 at commit 
[`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620981188


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122035/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620981181


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620981168


   **[Test build #122035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)**
 for PR 26141 at commit 
[`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620981181







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620980553


   **[Test build #122035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)**
 for PR 26141 at commit 
[`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620980835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620980838







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620980835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620980838







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



SparkQA commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620980554


   **[Test build #122034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122034/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620980553


   **[Test build #122035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)**
 for PR 26141 at commit 
[`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620979476


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122031/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-04-28 Thread GitBox



Ngone51 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r417057392



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   This could block `ClientEndpoint` when `waitAppCompletion=true`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620971895


   **[Test build #122031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)**
 for PR 28365 at commit 
[`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620979473


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xwu99 commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM

2020-04-28 Thread GitBox



xwu99 commented on pull request #28229:
URL: https://github.com/apache/spark/pull/28229#issuecomment-620979554


   > > I saw your PR was merged, I will rebase.
   > 
   > I had some reverted PRs on using high-level BLAS in LoR/LiR/SVC/GMM, they 
were reverted because of performance regression on sparse datasets;
   > I am now working on it again, using param `blockSize==1` to choose the 
impl.
   > I am also waiting for more feedbacks. If nobody object, I will merge them.
   > 
   > There are some common utils in those PRs, which should also be used in 
KMeans. So I think you can rebase this PR after 
[SVC](https://github.com/apache/spark/pull/28349) get merged.
   
   OK. could you also let me know the PRs you are reworking since we are also 
working on enabling high-level BLAS not only for k-means but also other algos 
in MLlib. I can help to review them rather than duplicate efforts.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



HyukjinKwon commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620979311


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620979460


   **[Test build #122031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)**
 for PR 28365 at commit 
[`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620979289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620979289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620979473







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsE

2020-04-28 Thread GitBox



turboFei edited a comment on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620979281


   can anyone help review this patch? it is a critical issue and might cause 
job failed.
   Thanks in advance!
   gentle ping @dongjoon-hyun @cloud-fan @gatorsmile @maropu @dbtsai @wangyum 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExceptio

2020-04-28 Thread GitBox



turboFei commented on pull request #26339:
URL: https://github.com/apache/spark/pull/26339#issuecomment-620979281


   can anyone help review this patch? it is a critical issue and would cause 
job failed.
   Thanks in advance!
   gentle ping @dongjoon-hyun @cloud-fan @gatorsmile @maropu @dbtsai @wangyum 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode

2020-04-28 Thread GitBox



SparkQA commented on pull request #26141:
URL: https://github.com/apache/spark/pull/26141#issuecomment-620979105


   **[Test build #122033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)**
 for PR 26141 at commit 
[`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0

2020-04-28 Thread GitBox



HyukjinKwon commented on pull request #28358:
URL: https://github.com/apache/spark/pull/28358#issuecomment-620978959


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0

2020-04-28 Thread GitBox



HyukjinKwon commented on pull request #28358:
URL: https://github.com/apache/spark/pull/28358#issuecomment-620979020


   Sure, no problem



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0

2020-04-28 Thread GitBox



dongjoon-hyun commented on pull request #28358:
URL: https://github.com/apache/spark/pull/28358#issuecomment-620978890


   Hi, @HyukjinKwon . It seems that R 4.0.0 is not easy. In this PR, I want to 
update Rtools first. This PR works for both R 3.6 and 4.0. Later, we need to 
update SparkR code first with R 3.6. Then, R 4.0.0 version switch will be the 
last piece of the work items. WDYT?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0

2020-04-28 Thread GitBox



dongjoon-hyun commented on pull request #28358:
URL: https://github.com/apache/spark/pull/28358#issuecomment-620978948


   Oh, Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory

2020-04-28 Thread GitBox



dongjoon-hyun commented on pull request #28385:
URL: https://github.com/apache/spark/pull/28385#issuecomment-620977810


   Thank you, @LantaoJin .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-620976532







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng edited a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox



zhengruifeng edited a comment on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390


   I will merge this PR this week if nobody object.
   Different from the [previous 
one](https://github.com/apache/spark/pull/27360), this PR will not cause 
performace regression on sparse datasets by default (since default 
`blockSize`=1, and the original impl is used).
   For expert users, they can tune `blockSize` for better performance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-620976532







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng edited a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox



zhengruifeng edited a comment on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390


   I will merge this PR this week if nobody object.
   Different from the [previous 
one](https://github.com/apache/spark/pull/27360), this PR will not cause 
performace regression by default (since default `blockSize`=1, and the original 
impl is used).
   For expert users, they can tune `blockSize` for better performance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors

2020-04-28 Thread GitBox



zhengruifeng commented on pull request #28349:
URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390


   I will merge this PR this week if nobody object.
   Different from the [previous 
one](https://github.com/apache/spark/pull/27360), this PR will no cause 
performace regression by default (since default `blockSize`=1, and the original 
impl is used).
   For expert users, they can tune `blockSize` for better performance.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-28 Thread GitBox



SparkQA commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-620976299


   **[Test build #122032 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122032/testReport)**
 for PR 28123 at commit 
[`19b8cd8`](https://github.com/apache/spark/commit/19b8cd80a62c6c05c79e4253c302071c97d712d1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-28 Thread GitBox



imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r417052767



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketsInJoin extends Rule[LogicalPlan]  {
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if 
r.bucketSpec.nonEmpty =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(
+  numBuckets1: Int,
+  numBuckets2: Int,
+  maxNumBucketsDiff: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  ((large - small) <= maxNumBucketsDiff)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) =>
+CoalesceBuckets(numCoalescedBuckets, l)
+}
+  }
+
+  object ExtractJoinWithBuckets {
+def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = {
+  plan match {
+case join: Join =>

Review comment:
   It will remove shuffle for full-outer join as well:
   With the feature off:
   ```
   scala> t1.join(t2, t1("i") === t2("i"), "full_outer").explain
   == Physical Plan ==
   SortMergeJoin [i#67], [i#73], FullOuter
   :- *(2) Sort [i#67 ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(i#67, 200), true, [id=#418]
   : +- *(1) ColumnarToRow
   :+- FileScan parquet default.t1[i#67,j#68,k#69] Batched: true, 
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct, SelectedBucketsCount: 8 out of 8
   +- *(4) Sort [i#73 ASC NULLS FIRST], false, 0
  +- Exchange hashpartitioning(i#73, 200), true, [id=#425]
 +- *(3) ColumnarToRow
+- FileScan parquet default.t2[i#73,j#74,k#75] Batched: true, 
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct, SelectedBucketsCount: 4 out of 4
   ```
   With the feature on:
   ```
   scala> t1.join(t2, t1("i") === t2("i"), "full_outer").explain
   == Physical Plan ==
   SortMergeJoin [i#67], [i#73], FullOuter
   :- *(1) Sort [i#67 ASC NULLS FIRST], false, 0
   :  +- *(1) ColumnarToRow
   : +- FileScan parquet default.t1[i#67,j#68,k#69] Batched: true, 
DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], 
PartitionFilters: [], PushedFilters: [], ReadSchema: 
struct, SelectedBucketsCount: 8 out of 8
   +- *(2) Sort [i#73 ASC NULLS FIRST], false, 0
  +- *(2) ColumnarToRow
 +- FileScan parquet

[GitHub] [spark] zhengruifeng commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM

2020-04-28 Thread GitBox



zhengruifeng commented on pull request #28229:
URL: https://github.com/apache/spark/pull/28229#issuecomment-620975624


   > I saw your PR was merged, I will rebase. 
   
   I had some reverted PRs on using high-level BLAS in LoR/LiR/SVC/GMM, they 
were reverted because of performance regression on sparse datasets;
   I am now working on it again, using param `blockSize==1` to choose the impl.
   I am also waiting for more feedbacks. If nobody object, I will merge them.
   
   There are some common utils in those PRs, which should also be used in 
KMeans. So I think you can rebase this PR after 
[SVC](https://github.com/apache/spark/pull/28349) get merged.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-28 Thread GitBox



imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r417052075



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala
##
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketsInJoin extends Rule[LogicalPlan]  {
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if 
r.bucketSpec.nonEmpty =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(
+  numBuckets1: Int,
+  numBuckets2: Int,
+  maxNumBucketsDiff: Int): Option[Int] = {

Review comment:
   I changed it to pass sqlConf (I guess that's what you were leaning 
toward?)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620972185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620972164


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122023/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620972161


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620972185







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620972161







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



SparkQA removed a comment on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620936871


   **[Test build #122023 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122023/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-28 Thread GitBox



SparkQA commented on pull request #28395:
URL: https://github.com/apache/spark/pull/28395#issuecomment-620971989


   **[Test build #122023 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122023/testReport)**
 for PR 28395 at commit 
[`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical

2020-04-28 Thread GitBox



SparkQA commented on pull request #28365:
URL: https://github.com/apache/spark/pull/28365#issuecomment-620971895


   **[Test build #122031 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)**
 for PR 28365 at commit 
[`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox



zhengruifeng commented on a change in pull request #27978:
URL: https://github.com/apache/spark/pull/27978#discussion_r417048353



##
File path: mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
##
@@ -0,0 +1,391 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * Params for [[Selector]] and [[SelectorModel]].
+ */
+private[feature] trait SelectorParams extends Params
+  with HasFeaturesCol with HasLabelCol with HasOutputCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by ascending 
p-value.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The higest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The upper bound of the expected false discovery rate.
+   * Only applicable when selectorType = "fdr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fdr = new DoubleParam(this, "fdr",
+"The upper bound of the expected false discovery rate.", 
ParamValidators.inRange(0, 1))
+  setDefault(fdr -> 0.05)
+
+  /** @group getParam */
+  def getFdr: Double = $(fdr)
+
+  /**
+   * The upper bound of the expected family-wise error rate.
+   * Only applicable when selectorType = "fwe".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fwe = new DoubleParam(this, "fwe",
+"The upper bound of the expected family-wise error rate.", 
ParamValidators.inRange(0, 1))
+  setDefault(fwe -> 0.05)
+
+  /** @group getParam */
+  def getFwe: Double = $(fwe)
+
+  /**
+   * The selector type.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr", 
"fdr", "fwe"
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type. Supported options: numTopFeatures, percentile, fpr, 
fdr, fwe",
+ParamValidators.inArray(Array("numTopFeatures", "percentile", "fpr", "fdr",
+  "fwe")))
+  setDefault(selectorType -> "numTopFeatures")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+
+}
+
+/**
+ * Super class for feature selectors.
+ * 1. Chi-Square Selector
+ * This feature selector

[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox



zhengruifeng commented on a change in pull request #27978:
URL: https://github.com/apache/spark/pull/27978#discussion_r417048353



##
File path: mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
##
@@ -0,0 +1,391 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import scala.collection.mutable.ArrayBuilder
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml._
+import org.apache.spark.ml.attribute.{AttributeGroup, _}
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{StructField, StructType}
+
+
+/**
+ * Params for [[Selector]] and [[SelectorModel]].
+ */
+private[feature] trait SelectorParams extends Params
+  with HasFeaturesCol with HasLabelCol with HasOutputCol {
+
+  /**
+   * Number of features that selector will select, ordered by ascending 
p-value. If the
+   * number of features is less than numTopFeatures, then this will select all 
features.
+   * Only applicable when selectorType = "numTopFeatures".
+   * The default value of numTopFeatures is 50.
+   *
+   * @group param
+   */
+  @Since("3.1.0")
+  final val numTopFeatures = new IntParam(this, "numTopFeatures",
+"Number of features that selector will select, ordered by ascending 
p-value. If the" +
+  " number of features is < numTopFeatures, then this will select all 
features.",
+ParamValidators.gtEq(1))
+  setDefault(numTopFeatures -> 50)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getNumTopFeatures: Int = $(numTopFeatures)
+
+  /**
+   * Percentile of features that selector will select, ordered by ascending 
p-value.
+   * Only applicable when selectorType = "percentile".
+   * Default value is 0.1.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val percentile = new DoubleParam(this, "percentile",
+"Percentile of features that selector will select, ordered by ascending 
p-value.",
+ParamValidators.inRange(0, 1))
+  setDefault(percentile -> 0.1)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getPercentile: Double = $(percentile)
+
+  /**
+   * The highest p-value for features to be kept.
+   * Only applicable when selectorType = "fpr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fpr = new DoubleParam(this, "fpr", "The higest p-value for 
features to be kept.",
+ParamValidators.inRange(0, 1))
+  setDefault(fpr -> 0.05)
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getFpr: Double = $(fpr)
+
+  /**
+   * The upper bound of the expected false discovery rate.
+   * Only applicable when selectorType = "fdr".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fdr = new DoubleParam(this, "fdr",
+"The upper bound of the expected false discovery rate.", 
ParamValidators.inRange(0, 1))
+  setDefault(fdr -> 0.05)
+
+  /** @group getParam */
+  def getFdr: Double = $(fdr)
+
+  /**
+   * The upper bound of the expected family-wise error rate.
+   * Only applicable when selectorType = "fwe".
+   * Default value is 0.05.
+   * @group param
+   */
+  @Since("3.1.0")
+  final val fwe = new DoubleParam(this, "fwe",
+"The upper bound of the expected family-wise error rate.", 
ParamValidators.inRange(0, 1))
+  setDefault(fwe -> 0.05)
+
+  /** @group getParam */
+  def getFwe: Double = $(fwe)
+
+  /**
+   * The selector type.
+   * Supported options: "numTopFeatures" (default), "percentile", "fpr", 
"fdr", "fwe"
+   * @group param
+   */
+  @Since("3.1.0")
+  final val selectorType = new Param[String](this, "selectorType",
+"The selector type. Supported options: numTopFeatures, percentile, fpr, 
fdr, fwe",
+ParamValidators.inArray(Array("numTopFeatures", "percentile", "fpr", "fdr",
+  "fwe")))
+  setDefault(selectorType -> "numTopFeatures")
+
+  /** @group getParam */
+  @Since("3.1.0")
+  def getSelectorType: String = $(selectorType)
+
+}
+
+/**
+ * Super class for feature selectors.
+ * 1. Chi-Square Selector
+ * This feature selector

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28351:
URL: https://github.com/apache/spark/pull/28351#issuecomment-620970546







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox



AmplabJenkins removed a comment on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620970502







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28351:
URL: https://github.com/apache/spark/pull/28351#issuecomment-620970546







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox



AmplabJenkins commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620970502







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector

2020-04-28 Thread GitBox



zhengruifeng commented on a change in pull request #27978:
URL: https://github.com/apache/spark/pull/27978#discussion_r417042961



##
File path: mllib/src/main/scala/org/apache/spark/ml/feature/FValueSelector.scala
##
@@ -154,111 +45,72 @@ private[feature] trait FValueSelectorParams extends Params
  * set to 50.
  */
 @Since("3.1.0")
-final class FValueSelector @Since("3.1.0") (override val uid: String)
-  extends Estimator[FValueSelectorModel] with FValueSelectorParams
-with DefaultParamsWritable {
+final class FValueSelector @Since("3.1.0") (@Since("3.1.0") override val uid: 
String) extends
+  Selector[FValueSelectorModel] {
 
   @Since("3.1.0")
   def this() = this(Identifiable.randomUID("FValueSelector"))
 
   /** @group setParam */
   @Since("3.1.0")
-  def setNumTopFeatures(value: Int): this.type = set(numTopFeatures, value)
+  override def setNumTopFeatures(value: Int): this.type = 
super.setNumTopFeatures(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setPercentile(value: Double): this.type = set(percentile, value)
+  override def setPercentile(value: Double): this.type = 
super.setPercentile(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setFpr(value: Double): this.type = set(fpr, value)
+  override def setFpr(value: Double): this.type = super.setFpr(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setFdr(value: Double): this.type = set(fdr, value)
+  override def setFdr(value: Double): this.type = super.setFdr(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setFwe(value: Double): this.type = set(fwe, value)
+  override def setFwe(value: Double): this.type = super.setFwe(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setSelectorType(value: String): this.type = set(selectorType, value)
+  override def setSelectorType(value: String): this.type = 
super.setSelectorType(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+  override def setFeaturesCol(value: String): this.type = 
super.setFeaturesCol(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setOutputCol(value: String): this.type = set(outputCol, value)
+  override def setOutputCol(value: String): this.type = 
super.setOutputCol(value)
 
   /** @group setParam */
   @Since("3.1.0")
-  def setLabelCol(value: String): this.type = set(labelCol, value)
+  override def setLabelCol(value: String): this.type = super.setLabelCol(value)
 
-  @Since("3.1.0")
-  override def fit(dataset: Dataset[_]): FValueSelectorModel = {
-transformSchema(dataset.schema, logging = true)
-val spark = dataset.sparkSession
-import spark.implicits._
-
-val numFeatures = MetadataUtils.getNumFeatures(dataset, $(featuresCol))
-val resultDF = FValueTest.test(dataset.toDF, $(featuresCol), $(labelCol), 
true)
-
-def getTopIndices(k: Int): Array[Int] = {
-  resultDF.sort("pValue", "featureIndex")
-.select("featureIndex")
-.limit(k)
-.as[Int]
-.collect()
-}
-
-val indices = $(selectorType) match {
-  case "numTopFeatures" =>
-getTopIndices($(numTopFeatures))
-  case "percentile" =>
-getTopIndices((numFeatures * getPercentile).toInt)
-  case "fpr" =>
-resultDF.select("featureIndex")
-  .where(col("pValue") < $(fpr))
-  .as[Int].collect()
-  case "fdr" =>
-// This uses the Benjamini-Hochberg procedure.
-// 
https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure
-val f = $(fdr) / numFeatures
-val maxIndex = resultDF.sort("pValue", "featureIndex")
-  .select("pValue")
-  .as[Double].rdd
-  .zipWithIndex
-  .flatMap { case (pValue, index) =>
-if (pValue <= f * (index + 1)) {
-  Iterator.single(index.toInt)
-} else Iterator.empty
-  }.fold(-1)(math.max)
-if (maxIndex >= 0) {
-  getTopIndices(maxIndex + 1)
-} else Array.emptyIntArray
-  case "fwe" =>
-resultDF.select("featureIndex")
-  .where(condition = col("pValue") < $(fwe) / numFeatures)
-  .as[Int].collect()
-  case errorType =>
-throw new IllegalStateException(s"Unknown Selector Type: $errorType")
-}
+  /**
+   * get the SelectionTestResult for every feature against the label
+   */
+  protected[this] override def getSelectionTestResult(df: DataFrame): 
DataFrame = {
+FValueTest.test(df, getFeaturesCol, getLabelCol, true)
+  }
 
-copyValues(new FValueSelectorModel(uid, indices.sorted).setParent(this))
+  /**
+   * Create a new instance of concrete SelectorModel.
+   * @param indices The indices of the selected features
+   * @return A new SelectorModel instance
+   */
+  protected[this] def createSelectorModel(
+  uid: String,
+  indices: Array[Int]): FValueSelectorModel = {
+new FValueSelectorModel(uid,

[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-28 Thread GitBox



SparkQA commented on pull request #28194:
URL: https://github.com/apache/spark/pull/28194#issuecomment-620970273


   **[Test build #122030 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122030/testReport)**
 for PR 28194 at commit 
[`2173536`](https://github.com/apache/spark/commit/2173536afe896910b1c5faa35de1efa34f01e0f3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-28 Thread GitBox



SparkQA commented on pull request #28351:
URL: https://github.com/apache/spark/pull/28351#issuecomment-620970257


   **[Test build #122029 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122029/testReport)**
 for PR 28351 at commit 
[`586c4b7`](https://github.com/apache/spark/commit/586c4b78c6121df589b40cf87de797b172299e68).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 865 matches

Mail list logo