[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620993313 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122037/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620993307 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620985246 **[Test build #122037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)** for PR 28365 at commit [`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620993291 **[Test build #122037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)** for PR 28365 at commit [`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620993307 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield removed a comment on pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield removed a comment on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-620460159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28376: [SPARK-31582] [Yarn] Being able to not populate Hadoop classpath
dongjoon-hyun commented on pull request #28376: URL: https://github.com/apache/spark/pull/28376#issuecomment-620992909 cc @holdenk since this seems to be targeting 2.4.6. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620992239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
dongjoon-hyun commented on a change in pull request #28392: URL: https://github.com/apache/spark/pull/28392#discussion_r417070637 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3425,6 +3425,28 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark assert(SQLConf.get.getConf(SQLConf.CODEGEN_FALLBACK) === true) } } + + test("SPARK-31594: Do not display the seed of rand/randn with no argument in output schema") { +def checkIfSeedExistsInExplain(df: DataFrame): Unit = { + val output = new java.io.ByteArrayOutputStream() + Console.withOut(output) { +df.explain() + } + output.toString.matches("""randn?\(-?[0-9]+\)""") +} +val df1 = sql("SELECT rand()") +assert(df1.schema.head.name === "rand()") +checkIfSeedExistsInExplain(df1) Review comment: If we add `assert` at line 3435, this test will fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620992239 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620991893 **[Test build #122039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122039/testReport)** for PR 26141 at commit [`baa9f06`](https://github.com/apache/spark/commit/baa9f0602e0d5fed526cef59eb7936b676d869a6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM
zhengruifeng commented on pull request #28229: URL: https://github.com/apache/spark/pull/28229#issuecomment-620990814 @xwu99 My previous works include: LinearSVC: https://github.com/apache/spark/pull/27360 LogisticRegression: https://github.com/apache/spark/pull/27374 LinearRegression: https://github.com/apache/spark/pull/27396 GaussianMixture: https://github.com/apache/spark/pull/27473 KMeans: https://github.com/apache/spark/compare/master...zhengruifeng:blockify_km?expand=1, not send I'm reworking on `LinearSVC`/`LogisticRegression`/`LinearRegression`/`GaussianMixture`. For KMeans I am glad you can take it over. I just recreate a new [PR](https://github.com/apache/spark/pull/28349) for LinearSVC, the main idea is to use expert param `blockSize` to choose the path. The original path will be choosen by default to avoid performance regression on sparse datasets. If nobody object, I will merge it, and then other three impls (since they depend on the first one, I do not recreate PRs right now) `LogisticRegression`/`LinearRegression`/`GaussianMixture`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
AmplabJenkins removed a comment on pull request #28394: URL: https://github.com/apache/spark/pull/28394#issuecomment-620990107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
dongjoon-hyun commented on a change in pull request #28392: URL: https://github.com/apache/spark/pull/28392#discussion_r417068674 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3425,6 +3425,28 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark assert(SQLConf.get.getConf(SQLConf.CODEGEN_FALLBACK) === true) } } + + test("SPARK-31594: Do not display the seed of rand/randn with no argument in output schema") { +def checkIfSeedExistsInExplain(df: DataFrame): Unit = { + val output = new java.io.ByteArrayOutputStream() + Console.withOut(output) { +df.explain() + } + output.toString.matches("""randn?\(-?[0-9]+\)""") Review comment: Did you want `assert(...)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
AmplabJenkins commented on pull request #28394: URL: https://github.com/apache/spark/pull/28394#issuecomment-620990107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
SparkQA removed a comment on pull request #28394: URL: https://github.com/apache/spark/pull/28394#issuecomment-620915241 **[Test build #122020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122020/testReport)** for PR 28394 at commit [`b7d14be`](https://github.com/apache/spark/commit/b7d14be909655e320d2e93a8a48249f56e3286ce). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28394: [SPARK-31596][SQL][DOCS] Generate SQL Configurations from hive module to configuration doc
SparkQA commented on pull request #28394: URL: https://github.com/apache/spark/pull/28394#issuecomment-620989321 **[Test build #122020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122020/testReport)** for PR 28394 at commit [`b7d14be`](https://github.com/apache/spark/commit/b7d14be909655e320d2e93a8a48249f56e3286ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on pull request #26624: [SPARK-8981][core] Add MDC support in Executor
igreenfield commented on pull request #26624: URL: https://github.com/apache/spark/pull/26624#issuecomment-620989210 @ngone51 can you help understand what went wrong in the build? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
AmplabJenkins commented on pull request #28392: URL: https://github.com/apache/spark/pull/28392#issuecomment-620988621 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
AmplabJenkins removed a comment on pull request #28392: URL: https://github.com/apache/spark/pull/28392#issuecomment-620988621 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
SparkQA commented on pull request #28392: URL: https://github.com/apache/spark/pull/28392#issuecomment-620988161 **[Test build #122021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122021/testReport)** for PR 28392 at commit [`4b1f3f2`](https://github.com/apache/spark/commit/4b1f3f2a34ddac22d940aac02a592b12a27ccb60). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Rand(child: Expression, hideSeed: Boolean = false)` * `case class Randn(child: Expression, hideSeed: Boolean = false)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28392: [SPARK-31594][SQL] Do not display the seed of rand/randn with no argument in output schema
SparkQA removed a comment on pull request #28392: URL: https://github.com/apache/spark/pull/28392#issuecomment-620917411 **[Test build #122021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122021/testReport)** for PR 28392 at commit [`4b1f3f2`](https://github.com/apache/spark/commit/4b1f3f2a34ddac22d940aac02a592b12a27ccb60). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985704 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122038/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985496 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985245 **[Test build #122038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)** for PR 26141 at commit [`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985698 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985698 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985690 **[Test build #122038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)** for PR 26141 at commit [`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985496 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620985245 **[Test build #122038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122038/testReport)** for PR 26141 at commit [`2ffe30c`](https://github.com/apache/spark/commit/2ffe30c433a2a705b42aead03f8ab68348e454c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620985246 **[Test build #122037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122037/testReport)** for PR 28365 at commit [`2c92360`](https://github.com/apache/spark/commit/2c92360a5e637678c37abd44b627230423893523). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620983922 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620983922 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982982 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122033/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620979105 **[Test build #122033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)** for PR 26141 at commit [`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982972 **[Test build #122033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)** for PR 26141 at commit [`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8). * This patch **fails build dependency tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982890 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982978 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982094 **[Test build #122036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)** for PR 26141 at commit [`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982881 **[Test build #122036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)** for PR 26141 at commit [`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982890 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620982094 **[Test build #122036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122036/testReport)** for PR 26141 at commit [`a855460`](https://github.com/apache/spark/commit/a855460ec885cdb44f710fc7e8684378d9fd6485). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620981188 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122035/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620981181 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620981168 **[Test build #122035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)** for PR 26141 at commit [`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620981181 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620980553 **[Test build #122035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)** for PR 26141 at commit [`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620980835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620980838 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620980835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins removed a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620980838 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
SparkQA commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620980554 **[Test build #122034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122034/testReport)** for PR 28395 at commit [`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620980553 **[Test build #122035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122035/testReport)** for PR 26141 at commit [`aab9b30`](https://github.com/apache/spark/commit/aab9b30ee8a83c9a4d1d9b422745c33aed29e9b7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620979476 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122031/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
Ngone51 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r417057392 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: This could block `ClientEndpoint` when `waitAppCompletion=true`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620971895 **[Test build #122031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)** for PR 28365 at commit [`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620979473 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xwu99 commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM
xwu99 commented on pull request #28229: URL: https://github.com/apache/spark/pull/28229#issuecomment-620979554 > > I saw your PR was merged, I will rebase. > > I had some reverted PRs on using high-level BLAS in LoR/LiR/SVC/GMM, they were reverted because of performance regression on sparse datasets; > I am now working on it again, using param `blockSize==1` to choose the impl. > I am also waiting for more feedbacks. If nobody object, I will merge them. > > There are some common utils in those PRs, which should also be used in KMeans. So I think you can rebase this PR after [SVC](https://github.com/apache/spark/pull/28349) get merged. OK. could you also let me know the PRs you are reworking since we are also working on enabling high-level BLAS not only for k-means but also other algos in MLlib. I can help to review them rather than duplicate efforts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
HyukjinKwon commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620979311 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620979460 **[Test build #122031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)** for PR 28365 at commit [`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins removed a comment on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620979289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
AmplabJenkins commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620979289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620979473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei edited a comment on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsE
turboFei edited a comment on pull request #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-620979281 can anyone help review this patch? it is a critical issue and might cause job failed. Thanks in advance! gentle ping @dongjoon-hyun @cloud-fan @gatorsmile @maropu @dbtsai @wangyum This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] turboFei commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExceptio
turboFei commented on pull request #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-620979281 can anyone help review this patch? it is a critical issue and would cause job failed. Thanks in advance! gentle ping @dongjoon-hyun @cloud-fan @gatorsmile @maropu @dbtsai @wangyum This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #26141: [SPARK-29492][SQL]Reset HiveSession's SessionState conf's ClassLoader when sync mode
SparkQA commented on pull request #26141: URL: https://github.com/apache/spark/pull/26141#issuecomment-620979105 **[Test build #122033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122033/testReport)** for PR 26141 at commit [`5b39d81`](https://github.com/apache/spark/commit/5b39d814fa54f24e26b7f91b0d75aac49b94fea8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0
HyukjinKwon commented on pull request #28358: URL: https://github.com/apache/spark/pull/28358#issuecomment-620978959 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0
HyukjinKwon commented on pull request #28358: URL: https://github.com/apache/spark/pull/28358#issuecomment-620979020 Sure, no problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0
dongjoon-hyun commented on pull request #28358: URL: https://github.com/apache/spark/pull/28358#issuecomment-620978890 Hi, @HyukjinKwon . It seems that R 4.0.0 is not easy. In this PR, I want to update Rtools first. This PR works for both R 3.6 and 4.0. Later, we need to update SparkR code first with R 3.6. Then, R 4.0.0 version switch will be the last piece of the work items. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28358: [SPARK-31567][R][TESTS] Update AppVeyor Rtools to 4.0.0
dongjoon-hyun commented on pull request #28358: URL: https://github.com/apache/spark/pull/28358#issuecomment-620978948 Oh, Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28385: [SPARK-31591][CORE] Fix null name prefix when create directory
dongjoon-hyun commented on pull request #28385: URL: https://github.com/apache/spark/pull/28385#issuecomment-620977810 Thank you, @LantaoJin . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins removed a comment on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-620976532 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
zhengruifeng edited a comment on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390 I will merge this PR this week if nobody object. Different from the [previous one](https://github.com/apache/spark/pull/27360), this PR will not cause performace regression on sparse datasets by default (since default `blockSize`=1, and the original impl is used). For expert users, they can tune `blockSize` for better performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
AmplabJenkins commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-620976532 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng edited a comment on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
zhengruifeng edited a comment on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390 I will merge this PR this week if nobody object. Different from the [previous one](https://github.com/apache/spark/pull/27360), this PR will not cause performace regression by default (since default `blockSize`=1, and the original impl is used). For expert users, they can tune `blockSize` for better performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #28349: [SPARK-30642][ML][PYSPARK] LinearSVC blockify input vectors
zhengruifeng commented on pull request #28349: URL: https://github.com/apache/spark/pull/28349#issuecomment-620976390 I will merge this PR this week if nobody object. Different from the [previous one](https://github.com/apache/spark/pull/27360), this PR will no cause performace regression by default (since default `blockSize`=1, and the original impl is used). For expert users, they can tune `blockSize` for better performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
SparkQA commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-620976299 **[Test build #122032 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122032/testReport)** for PR 28123 at commit [`19b8cd8`](https://github.com/apache/spark/commit/19b8cd80a62c6c05c79e4253c302071c97d712d1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r417052767 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketsInJoin extends Rule[LogicalPlan] { + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if r.bucketSpec.nonEmpty => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce( + numBuckets1: Int, + numBuckets2: Int, + maxNumBucketsDiff: Int): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && + ((large - small) <= maxNumBucketsDiff)) { + Some(small) +} else { + None +} + } + + private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) => +CoalesceBuckets(numCoalescedBuckets, l) +} + } + + object ExtractJoinWithBuckets { +def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = { + plan match { +case join: Join => Review comment: It will remove shuffle for full-outer join as well: With the feature off: ``` scala> t1.join(t2, t1("i") === t2("i"), "full_outer").explain == Physical Plan == SortMergeJoin [i#67], [i#73], FullOuter :- *(2) Sort [i#67 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(i#67, 200), true, [id=#418] : +- *(1) ColumnarToRow :+- FileScan parquet default.t1[i#67,j#68,k#69] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct, SelectedBucketsCount: 8 out of 8 +- *(4) Sort [i#73 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(i#73, 200), true, [id=#425] +- *(3) ColumnarToRow +- FileScan parquet default.t2[i#73,j#74,k#75] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct, SelectedBucketsCount: 4 out of 4 ``` With the feature on: ``` scala> t1.join(t2, t1("i") === t2("i"), "full_outer").explain == Physical Plan == SortMergeJoin [i#67], [i#73], FullOuter :- *(1) Sort [i#67 ASC NULLS FIRST], false, 0 : +- *(1) ColumnarToRow : +- FileScan parquet default.t1[i#67,j#68,k#69] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[], PartitionFilters: [], PushedFilters: [], ReadSchema: struct, SelectedBucketsCount: 8 out of 8 +- *(2) Sort [i#73 ASC NULLS FIRST], false, 0 +- *(2) ColumnarToRow +- FileScan parquet
[GitHub] [spark] zhengruifeng commented on pull request #28229: [SPARK-31454][ML] An optimized K-Means based on DenseMatrix and GEMM
zhengruifeng commented on pull request #28229: URL: https://github.com/apache/spark/pull/28229#issuecomment-620975624 > I saw your PR was merged, I will rebase. I had some reverted PRs on using high-level BLAS in LoR/LiR/SVC/GMM, they were reverted because of performance regression on sparse datasets; I am now working on it again, using param `blockSize==1` to choose the impl. I am also waiting for more feedbacks. If nobody object, I will merge them. There are some common utils in those PRs, which should also be used in KMeans. So I think you can rebase this PR after [SVC](https://github.com/apache/spark/pull/28349) get merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r417052075 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule injects a hint if one side of two bucketed tables can be coalesced + * when the two bucketed tables are inner-joined and they differ in the number of buckets. + */ +object CoalesceBucketsInJoin extends Rule[LogicalPlan] { + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if r.bucketSpec.nonEmpty => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce( + numBuckets1: Int, + numBuckets2: Int, + maxNumBucketsDiff: Int): Option[Int] = { Review comment: I changed it to pass sqlConf (I guess that's what you were leaning toward?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins removed a comment on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620972185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins removed a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620972164 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122023/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins removed a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620972161 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
AmplabJenkins commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620972185 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
AmplabJenkins commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620972161 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
SparkQA removed a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620936871 **[Test build #122023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122023/testReport)** for PR 28395 at commit [`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group
SparkQA commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-620971989 **[Test build #122023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122023/testReport)** for PR 28395 at commit [`481bba6`](https://github.com/apache/spark/commit/481bba62ce62a13f23ce153e54e8a5f56f6059c2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28365: [SPARK-31571][R] Overhaul stop/message/warning calls to be more canonical
SparkQA commented on pull request #28365: URL: https://github.com/apache/spark/pull/28365#issuecomment-620971895 **[Test build #122031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122031/testReport)** for PR 28365 at commit [`4d86dc6`](https://github.com/apache/spark/commit/4d86dc605ab8844181472d0220ebc24e835f3dff). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector
zhengruifeng commented on a change in pull request #27978: URL: https://github.com/apache/spark/pull/27978#discussion_r417048353 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala ## @@ -0,0 +1,391 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.collection.mutable.ArrayBuilder + +import org.apache.spark.annotation.Since +import org.apache.spark.ml._ +import org.apache.spark.ml.attribute.{AttributeGroup, _} +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * Params for [[Selector]] and [[SelectorModel]]. + */ +private[feature] trait SelectorParams extends Params + with HasFeaturesCol with HasLabelCol with HasOutputCol { + + /** + * Number of features that selector will select, ordered by ascending p-value. If the + * number of features is less than numTopFeatures, then this will select all features. + * Only applicable when selectorType = "numTopFeatures". + * The default value of numTopFeatures is 50. + * + * @group param + */ + @Since("3.1.0") + final val numTopFeatures = new IntParam(this, "numTopFeatures", +"Number of features that selector will select, ordered by ascending p-value. If the" + + " number of features is < numTopFeatures, then this will select all features.", +ParamValidators.gtEq(1)) + setDefault(numTopFeatures -> 50) + + /** @group getParam */ + @Since("3.1.0") + def getNumTopFeatures: Int = $(numTopFeatures) + + /** + * Percentile of features that selector will select, ordered by ascending p-value. + * Only applicable when selectorType = "percentile". + * Default value is 0.1. + * @group param + */ + @Since("3.1.0") + final val percentile = new DoubleParam(this, "percentile", +"Percentile of features that selector will select, ordered by ascending p-value.", +ParamValidators.inRange(0, 1)) + setDefault(percentile -> 0.1) + + /** @group getParam */ + @Since("3.1.0") + def getPercentile: Double = $(percentile) + + /** + * The highest p-value for features to be kept. + * Only applicable when selectorType = "fpr". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fpr = new DoubleParam(this, "fpr", "The higest p-value for features to be kept.", +ParamValidators.inRange(0, 1)) + setDefault(fpr -> 0.05) + + /** @group getParam */ + @Since("3.1.0") + def getFpr: Double = $(fpr) + + /** + * The upper bound of the expected false discovery rate. + * Only applicable when selectorType = "fdr". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fdr = new DoubleParam(this, "fdr", +"The upper bound of the expected false discovery rate.", ParamValidators.inRange(0, 1)) + setDefault(fdr -> 0.05) + + /** @group getParam */ + def getFdr: Double = $(fdr) + + /** + * The upper bound of the expected family-wise error rate. + * Only applicable when selectorType = "fwe". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fwe = new DoubleParam(this, "fwe", +"The upper bound of the expected family-wise error rate.", ParamValidators.inRange(0, 1)) + setDefault(fwe -> 0.05) + + /** @group getParam */ + def getFwe: Double = $(fwe) + + /** + * The selector type. + * Supported options: "numTopFeatures" (default), "percentile", "fpr", "fdr", "fwe" + * @group param + */ + @Since("3.1.0") + final val selectorType = new Param[String](this, "selectorType", +"The selector type. Supported options: numTopFeatures, percentile, fpr, fdr, fwe", +ParamValidators.inArray(Array("numTopFeatures", "percentile", "fpr", "fdr", + "fwe"))) + setDefault(selectorType -> "numTopFeatures") + + /** @group getParam */ + @Since("3.1.0") + def getSelectorType: String = $(selectorType) + +} + +/** + * Super class for feature selectors. + * 1. Chi-Square Selector + * This feature selector
[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector
zhengruifeng commented on a change in pull request #27978: URL: https://github.com/apache/spark/pull/27978#discussion_r417048353 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala ## @@ -0,0 +1,391 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.collection.mutable.ArrayBuilder + +import org.apache.spark.annotation.Since +import org.apache.spark.ml._ +import org.apache.spark.ml.attribute.{AttributeGroup, _} +import org.apache.spark.ml.linalg._ +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared._ +import org.apache.spark.ml.util._ +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types.{StructField, StructType} + + +/** + * Params for [[Selector]] and [[SelectorModel]]. + */ +private[feature] trait SelectorParams extends Params + with HasFeaturesCol with HasLabelCol with HasOutputCol { + + /** + * Number of features that selector will select, ordered by ascending p-value. If the + * number of features is less than numTopFeatures, then this will select all features. + * Only applicable when selectorType = "numTopFeatures". + * The default value of numTopFeatures is 50. + * + * @group param + */ + @Since("3.1.0") + final val numTopFeatures = new IntParam(this, "numTopFeatures", +"Number of features that selector will select, ordered by ascending p-value. If the" + + " number of features is < numTopFeatures, then this will select all features.", +ParamValidators.gtEq(1)) + setDefault(numTopFeatures -> 50) + + /** @group getParam */ + @Since("3.1.0") + def getNumTopFeatures: Int = $(numTopFeatures) + + /** + * Percentile of features that selector will select, ordered by ascending p-value. + * Only applicable when selectorType = "percentile". + * Default value is 0.1. + * @group param + */ + @Since("3.1.0") + final val percentile = new DoubleParam(this, "percentile", +"Percentile of features that selector will select, ordered by ascending p-value.", +ParamValidators.inRange(0, 1)) + setDefault(percentile -> 0.1) + + /** @group getParam */ + @Since("3.1.0") + def getPercentile: Double = $(percentile) + + /** + * The highest p-value for features to be kept. + * Only applicable when selectorType = "fpr". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fpr = new DoubleParam(this, "fpr", "The higest p-value for features to be kept.", +ParamValidators.inRange(0, 1)) + setDefault(fpr -> 0.05) + + /** @group getParam */ + @Since("3.1.0") + def getFpr: Double = $(fpr) + + /** + * The upper bound of the expected false discovery rate. + * Only applicable when selectorType = "fdr". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fdr = new DoubleParam(this, "fdr", +"The upper bound of the expected false discovery rate.", ParamValidators.inRange(0, 1)) + setDefault(fdr -> 0.05) + + /** @group getParam */ + def getFdr: Double = $(fdr) + + /** + * The upper bound of the expected family-wise error rate. + * Only applicable when selectorType = "fwe". + * Default value is 0.05. + * @group param + */ + @Since("3.1.0") + final val fwe = new DoubleParam(this, "fwe", +"The upper bound of the expected family-wise error rate.", ParamValidators.inRange(0, 1)) + setDefault(fwe -> 0.05) + + /** @group getParam */ + def getFwe: Double = $(fwe) + + /** + * The selector type. + * Supported options: "numTopFeatures" (default), "percentile", "fpr", "fdr", "fwe" + * @group param + */ + @Since("3.1.0") + final val selectorType = new Param[String](this, "selectorType", +"The selector type. Supported options: numTopFeatures, percentile, fpr, fdr, fwe", +ParamValidators.inArray(Array("numTopFeatures", "percentile", "fpr", "fdr", + "fwe"))) + setDefault(selectorType -> "numTopFeatures") + + /** @group getParam */ + @Since("3.1.0") + def getSelectorType: String = $(selectorType) + +} + +/** + * Super class for feature selectors. + * 1. Chi-Square Selector + * This feature selector
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
AmplabJenkins removed a comment on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-620970546 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.
AmplabJenkins removed a comment on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-620970502 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
AmplabJenkins commented on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-620970546 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.
AmplabJenkins commented on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-620970502 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #27978: [SPARK-31127][ML] Implement abstract Selector
zhengruifeng commented on a change in pull request #27978: URL: https://github.com/apache/spark/pull/27978#discussion_r417042961 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/FValueSelector.scala ## @@ -154,111 +45,72 @@ private[feature] trait FValueSelectorParams extends Params * set to 50. */ @Since("3.1.0") -final class FValueSelector @Since("3.1.0") (override val uid: String) - extends Estimator[FValueSelectorModel] with FValueSelectorParams -with DefaultParamsWritable { +final class FValueSelector @Since("3.1.0") (@Since("3.1.0") override val uid: String) extends + Selector[FValueSelectorModel] { @Since("3.1.0") def this() = this(Identifiable.randomUID("FValueSelector")) /** @group setParam */ @Since("3.1.0") - def setNumTopFeatures(value: Int): this.type = set(numTopFeatures, value) + override def setNumTopFeatures(value: Int): this.type = super.setNumTopFeatures(value) /** @group setParam */ @Since("3.1.0") - def setPercentile(value: Double): this.type = set(percentile, value) + override def setPercentile(value: Double): this.type = super.setPercentile(value) /** @group setParam */ @Since("3.1.0") - def setFpr(value: Double): this.type = set(fpr, value) + override def setFpr(value: Double): this.type = super.setFpr(value) /** @group setParam */ @Since("3.1.0") - def setFdr(value: Double): this.type = set(fdr, value) + override def setFdr(value: Double): this.type = super.setFdr(value) /** @group setParam */ @Since("3.1.0") - def setFwe(value: Double): this.type = set(fwe, value) + override def setFwe(value: Double): this.type = super.setFwe(value) /** @group setParam */ @Since("3.1.0") - def setSelectorType(value: String): this.type = set(selectorType, value) + override def setSelectorType(value: String): this.type = super.setSelectorType(value) /** @group setParam */ @Since("3.1.0") - def setFeaturesCol(value: String): this.type = set(featuresCol, value) + override def setFeaturesCol(value: String): this.type = super.setFeaturesCol(value) /** @group setParam */ @Since("3.1.0") - def setOutputCol(value: String): this.type = set(outputCol, value) + override def setOutputCol(value: String): this.type = super.setOutputCol(value) /** @group setParam */ @Since("3.1.0") - def setLabelCol(value: String): this.type = set(labelCol, value) + override def setLabelCol(value: String): this.type = super.setLabelCol(value) - @Since("3.1.0") - override def fit(dataset: Dataset[_]): FValueSelectorModel = { -transformSchema(dataset.schema, logging = true) -val spark = dataset.sparkSession -import spark.implicits._ - -val numFeatures = MetadataUtils.getNumFeatures(dataset, $(featuresCol)) -val resultDF = FValueTest.test(dataset.toDF, $(featuresCol), $(labelCol), true) - -def getTopIndices(k: Int): Array[Int] = { - resultDF.sort("pValue", "featureIndex") -.select("featureIndex") -.limit(k) -.as[Int] -.collect() -} - -val indices = $(selectorType) match { - case "numTopFeatures" => -getTopIndices($(numTopFeatures)) - case "percentile" => -getTopIndices((numFeatures * getPercentile).toInt) - case "fpr" => -resultDF.select("featureIndex") - .where(col("pValue") < $(fpr)) - .as[Int].collect() - case "fdr" => -// This uses the Benjamini-Hochberg procedure. -// https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure -val f = $(fdr) / numFeatures -val maxIndex = resultDF.sort("pValue", "featureIndex") - .select("pValue") - .as[Double].rdd - .zipWithIndex - .flatMap { case (pValue, index) => -if (pValue <= f * (index + 1)) { - Iterator.single(index.toInt) -} else Iterator.empty - }.fold(-1)(math.max) -if (maxIndex >= 0) { - getTopIndices(maxIndex + 1) -} else Array.emptyIntArray - case "fwe" => -resultDF.select("featureIndex") - .where(condition = col("pValue") < $(fwe) / numFeatures) - .as[Int].collect() - case errorType => -throw new IllegalStateException(s"Unknown Selector Type: $errorType") -} + /** + * get the SelectionTestResult for every feature against the label + */ + protected[this] override def getSelectionTestResult(df: DataFrame): DataFrame = { +FValueTest.test(df, getFeaturesCol, getLabelCol, true) + } -copyValues(new FValueSelectorModel(uid, indices.sorted).setParent(this)) + /** + * Create a new instance of concrete SelectorModel. + * @param indices The indices of the selected features + * @return A new SelectorModel instance + */ + protected[this] def createSelectorModel( + uid: String, + indices: Array[Int]): FValueSelectorModel = { +new FValueSelectorModel(uid,
[GitHub] [spark] SparkQA commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.
SparkQA commented on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-620970273 **[Test build #122030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122030/testReport)** for PR 28194 at commit [`2173536`](https://github.com/apache/spark/commit/2173536afe896910b1c5faa35de1efa34f01e0f3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements
SparkQA commented on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-620970257 **[Test build #122029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122029/testReport)** for PR 28351 at commit [`586c4b7`](https://github.com/apache/spark/commit/586c4b78c6121df589b40cf87de797b172299e68). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org