[GitHub] [spark] GuoPhilipse commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs

2020-07-15 Thread GitBox


GuoPhilipse commented on a change in pull request #29056:
URL: https://github.com/apache/spark/pull/29056#discussion_r455549369



##
File path: docs/sql-ref-syntax-qry.md
##
@@ -45,4 +45,7 @@ ability to generate logical and physical plan for a given 
query using
   * [TABLESAMPLE](sql-ref-syntax-qry-select-sampling.html)
   * [Table-valued Function](sql-ref-syntax-qry-select-tvf.html)
   * [Window Function](sql-ref-syntax-qry-select-window.html)
+  * [CASE Clause](sql-ref-syntax-qry-select-case.html)

Review comment:
   I saw mysql and hive both define CASE as function singlely, not defined 
in `SELECT` clause
   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
   https://dev.mysql.com/doc/refman/8.0/en/control-flow-functions.html





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659194547


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125939/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659185215


   **[Test build #125939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125939/testReport)**
 for PR 29117 at commit 
[`3dc0c22`](https://github.com/apache/spark/commit/3dc0c2259bcf3297dd3ce82266e2192092ecd0e6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659194539







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659194502


   **[Test build #125939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125939/testReport)**
 for PR 29117 at commit 
[`3dc0c22`](https://github.com/apache/spark/commit/3dc0c2259bcf3297dd3ce82266e2192092ecd0e6).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659194539


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-15 Thread GitBox


SparkQA commented on pull request #29126:
URL: https://github.com/apache/spark/pull/29126#issuecomment-659194678


   **[Test build #125941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125941/testReport)**
 for PR 29126 at commit 
[`f943465`](https://github.com/apache/spark/commit/f943465514453ccc7c2ff23965d82baa687cdf9e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659193941


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125931/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659193931


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-15 Thread GitBox


SparkQA commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-659194131


   **[Test build #125940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125940/testReport)**
 for PR 29064 at commit 
[`e5aa3b3`](https://github.com/apache/spark/commit/e5aa3b3ff5c720d4d64f16b4dd5c6c1e586812d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659139822


   **[Test build #125931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125931/testReport)**
 for PR 29089 at commit 
[`21a84ad`](https://github.com/apache/spark/commit/21a84adb3561788eea0e98c62129127b5bc9d5d5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659193931







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-15 Thread GitBox


SparkQA commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-659193547


   **[Test build #125931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125931/testReport)**
 for PR 29089 at commit 
[`21a84ad`](https://github.com/apache/spark/commit/21a84adb3561788eea0e98c62129127b5bc9d5d5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659187337







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659187337







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


viirya commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659186161


   Oh I see. So you must import from cloudpickle_fast now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-659185091







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


SparkQA commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659185215


   **[Test build #125939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125939/testReport)**
 for PR 29117 at commit 
[`3dc0c22`](https://github.com/apache/spark/commit/3dc0c2259bcf3297dd3ce82266e2192092ecd0e6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-659185091







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-659131918


   **[Test build #125928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125928/testReport)**
 for PR 29123 at commit 
[`45d1e43`](https://github.com/apache/spark/commit/45d1e4341ecab8d5271e17f9ae13072c71c46e32).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29123: [SPARK-32283][CORE] Kryo should support multiple user registrators

2020-07-15 Thread GitBox


SparkQA commented on pull request #29123:
URL: https://github.com/apache/spark/pull/29123#issuecomment-659184552


   **[Test build #125928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125928/testReport)**
 for PR 29123 at commit 
[`45d1e43`](https://github.com/apache/spark/commit/45d1e4341ecab8d5271e17f9ae13072c71c46e32).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #28833: [SPARK-20680][SQL] Spark-sql do not support for creating table with void column datatype

2020-07-15 Thread GitBox


cloud-fan commented on pull request #28833:
URL: https://github.com/apache/spark/pull/28833#issuecomment-659183541


   I don't think it's a good idea to diverge the behavior between in-memory and 
hive catalogs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659183238


   Yeah, it does with Python 3.8. It was clear before because it was 
conditionally chosen in `__init__.py` but it was changed in the latest version 
at 
https://github.com/cloudpipe/cloudpickle/commit/938553fff60bf2b06b2286c3f564a587153dc5e4#diff-17e83ea5b8e0670ba384e6bc36815316L9



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659182941


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125915/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659183089


   **[Test build #125938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125938/testReport)**
 for PR 29114 at commit 
[`77055c6`](https://github.com/apache/spark/commit/77055c62ab2af9bff40d1eecb7cb3a417f6423ac).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29106: [SPARK-32307][SQL] ScalaUDF's canonicalized expression should exclude inputEncoders

2020-07-15 Thread GitBox


cloud-fan commented on pull request #29106:
URL: https://github.com/apache/spark/pull/29106#issuecomment-659183181


   > Merged to master/3.0.
   
   has it been merged to 3.0 or not?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659182933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659182933


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659074859


   **[Test build #125915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125915/testReport)**
 for PR 28840 at commit 
[`711656d`](https://github.com/apache/spark/commit/711656d7e4e1632b2a3dbf9e5030b92170ffbc1e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659182277







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659182277







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-15 Thread GitBox


SparkQA commented on pull request #28840:
URL: https://github.com/apache/spark/pull/28840#issuecomment-659182303


   **[Test build #125915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125915/testReport)**
 for PR 28840 at commit 
[`711656d`](https://github.com/apache/spark/commit/711656d7e4e1632b2a3dbf9e5030b92170ffbc1e).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-659176704







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455525039



##
File path: 
core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
##
@@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient(
 if (ExecutorState.isFinished(state)) {
   listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, 
workerLost)
 } else if (state == ExecutorState.DECOMMISSIONED) {
-  listener.executorDecommissioned(fullId, message.getOrElse(""))
+  listener.executorDecommissioned(fullId,
+ExecutorDecommissionInfo(message.getOrElse(""), 
isHostDecommissioned = workerLost))

Review comment:
   oh I see https://github.com/apache/spark/pull/29032#discussion_r455401121





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


c21 commented on pull request #29130:
URL: https://github.com/apache/spark/pull/29130#issuecomment-659174967


   cc @maropu, @cloud-fan, @gatorsmile and @sameeragarwal if you guys can help 
take a look. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29032:
URL: https://github.com/apache/spark/pull/29032#discussion_r455524790



##
File path: 
core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala
##
@@ -181,7 +182,8 @@ private[spark] class StandaloneAppClient(
 if (ExecutorState.isFinished(state)) {
   listener.executorRemoved(fullId, message.getOrElse(""), exitStatus, 
workerLost)
 } else if (state == ExecutorState.DECOMMISSIONED) {
-  listener.executorDecommissioned(fullId, message.getOrElse(""))
+  listener.executorDecommissioned(fullId,
+ExecutorDecommissionInfo(message.getOrElse(""), 
isHostDecommissioned = workerLost))

Review comment:
   how is the flag `isHostDecommissioned` actually used?

##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
##
@@ -101,7 +101,8 @@ private[spark] trait TaskScheduler {
   /**
* Process a decommissioning executor.
*/
-  def executorDecommission(executorId: String): Unit
+  def executorDecommission(

Review comment:
   nit: don't leave an empty implementation here.

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
##
@@ -191,9 +191,9 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 
executorDataMap.get(executorId).foreach(_.executorEndpoint.send(StopExecutor))
 removeExecutor(executorId, reason)
 
-  case DecommissionExecutor(executorId) =>
+  case DecommissionExecutor(executorId, decommissionInfo) =>
 logError(s"Received decommission executor message ${executorId}.")

Review comment:
   do we want to also include the decommissionInfo in the error msg?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #29130: [SPARK-32330][SQL] Preserve shuffled hash join build side partitioning

2020-07-15 Thread GitBox


c21 opened a new pull request #29130:
URL: https://github.com/apache/spark/pull/29130


   
   
   ### What changes were proposed in this pull request?
   
   
   Currently `ShuffledHashJoin.outputPartitioning` inherits from 
`HashJoin.outputPartitioning`, which only preserves stream side partitioning 
(`HashJoin.scala`):
   
   ```
   override def outputPartitioning: Partitioning = 
streamedPlan.outputPartitioning
   ```
   
   This loses build side partitioning information, and causes extra shuffle if 
there's another join / group-by after this join.
   
   Example:
   
   ```
   withSQLConf(
   SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "50",
   SQLConf.SHUFFLE_PARTITIONS.key -> "2",
   SQLConf.PREFER_SORTMERGEJOIN.key -> "false") {
 val df1 = spark.range(10).select($"id".as("k1"))
 val df2 = spark.range(30).select($"id".as("k2"))
 Seq("inner", "cross").foreach(joinType => {
   val plan = df1.join(df2, $"k1" === $"k2", 
joinType).groupBy($"k1").count()
 .queryExecution.executedPlan
   assert(plan.collect { case _: ShuffledHashJoinExec => true }.size === 1)
   // No extra shuffle before aggregate
   assert(plan.collect { case _: ShuffleExchangeExec => true }.size === 2)
 })
   }
   ```
   
   Current physical plan (having an extra shuffle on `k1` before aggregate)
   
   ``` 
   *(4) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
count#235L])
   +- Exchange hashpartitioning(k1#220L, 2), true, [id=#117]
  +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
output=[k1#220L, count#239L])
 +- *(3) Project [k1#220L]
+- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
   :- Exchange hashpartitioning(k1#220L, 2), true, [id=#109]
   :  +- *(1) Project [id#218L AS k1#220L]
   : +- *(1) Range (0, 10, step=1, splits=2)
   +- Exchange hashpartitioning(k2#224L, 2), true, [id=#111]
  +- *(2) Project [id#222L AS k2#224L]
 +- *(2) Range (0, 30, step=1, splits=2)
   ``` 
   
   Ideal physical plan (no shuffle on `k1` before aggregate)
   
   ```
   *(3) HashAggregate(keys=[k1#220L], functions=[count(1)], output=[k1#220L, 
count#235L])
   +- *(3) HashAggregate(keys=[k1#220L], functions=[partial_count(1)], 
output=[k1#220L, count#239L])
  +- *(3) Project [k1#220L]
 +- ShuffledHashJoin [k1#220L], [k2#224L], Inner, BuildLeft
:- Exchange hashpartitioning(k1#220L, 2), true, [id=#107]
:  +- *(1) Project [id#218L AS k1#220L]
: +- *(1) Range (0, 10, step=1, splits=2)
+- Exchange hashpartitioning(k2#224L, 2), true, [id=#109]
   +- *(2) Project [id#222L AS k2#224L]
  +- *(2) Range (0, 30, step=1, splits=2)
   ``` 
   
   This can be fixed by overriding `outputPartitioning` method in 
`ShuffledHashJoinExec`, similar to `SortMergeJoinExec`.
   In addition, also fix one typo in `HashJoin`, as that code path is shared 
between broadcast hash join and shuffled hash join.
   
   
   ### Why are the changes needed?
   
   To avoid shuffle (for queries having multiple joins or group-by), for saving 
CPU and IO.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Added unit test in `JoinSuite`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28961: [SPARK-32143][SQL] Prevent a skewed join from producing too many partition splits

2020-07-15 Thread GitBox


SparkQA commented on pull request #28961:
URL: https://github.com/apache/spark/pull/28961#issuecomment-659172738


   **[Test build #125937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125937/testReport)**
 for PR 28961 at commit 
[`3811ae9`](https://github.com/apache/spark/commit/3811ae93c2966d87619624670e338b7c6d34b7d4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659172455







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659171827







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659112300


   **[Test build #125925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)**
 for PR 29032 at commit 
[`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659171262


   @viirya  This PR is only on cpu.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659171509


   **[Test build #125925 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125925/testReport)**
 for PR 29032 at commit 
[`aa6aae9`](https://github.com/apache/spark/commit/aa6aae9f71530802b07ead24d04342c5ef4f09c0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29107: [SPARK-32308][SQL] Move by-name resolution logic of unionByName from API code to analysis phase

2020-07-15 Thread GitBox


viirya commented on a change in pull request #29107:
URL: https://github.com/apache/spark/pull/29107#discussion_r455521016



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -1099,6 +1101,64 @@ object TypeCoercion {
 DateSub(l, Literal(days))
 }
   }
+
+  /**
+   * Coerces different children of Union to a common set of columns. Note that 
this must be
+   * run before `WidenSetOperationTypes`, because `WidenSetOperationTypes` 
should be run on
+   * correctly resolved column by name.
+   */
+  object UnionCoercion extends TypeCoercionRule {
+private def unionTwoSides(
+left: LogicalPlan, right: LogicalPlan, allowMissingCol: Boolean): 
LogicalPlan = {
+  val resolver = SQLConf.get.resolver
+  val leftOutputAttrs = left.output
+  val rightOutputAttrs = right.output
+
+  // Builds a project list for `right` based on `left` output names
+  val rightProjectList = leftOutputAttrs.map { lattr =>
+rightOutputAttrs.find { rattr => resolver(lattr.name, rattr.name) 
}.getOrElse {
+  if (allowMissingCol) {
+Alias(Literal(null, lattr.dataType), lattr.name)()
+  } else {
+throw new AnalysisException(
+  s"""Cannot resolve column name "${lattr.name}" among """ +
+s"""(${rightOutputAttrs.map(_.name).mkString(", ")})""")
+  }
+}
+  }
+
+  // Delegates failure checks to `CheckAnalysis`
+  val notFoundAttrs = rightOutputAttrs.diff(rightProjectList)
+  val rightChild = Project(rightProjectList ++ notFoundAttrs, right)
+
+  // Builds a project for `logicalPlan` based on `right` output names, if 
allowing
+  // missing columns.
+  val leftChild = if (allowMissingCol) {
+val missingAttrs = notFoundAttrs.map { attr =>
+  Alias(Literal(null, attr.dataType), attr.name)()
+}
+if (missingAttrs.nonEmpty) {
+  Project(leftOutputAttrs ++ missingAttrs, left)
+} else {
+  left
+}
+  } else {
+left
+  }
+  Union(leftChild, rightChild)
+}
+
+override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan 
resolveOperatorsUp {
+  case e if !e.childrenResolved => e
+
+  case Union(children, byName, allowMissingCol)
+  if byName =>
+val union = children.reduceLeft { (left: LogicalPlan, right: 
LogicalPlan) =>
+  unionTwoSides(left, right, allowMissingCol)

Review comment:
   If looks not proper after rethinking, we can also move to other rule or 
create another rule.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


viirya commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659170171


   Is this also memory optimization? But looks like cpu time optimization from 
the description?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29021: [WIP][SPARK-32201][SQL] More general skew join pattern matching

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29021:
URL: https://github.com/apache/spark/pull/29021#issuecomment-659169320







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jiangxb1987 commented on a change in pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


jiangxb1987 commented on a change in pull request #29015:
URL: https://github.com/apache/spark/pull/29015#discussion_r455517786



##
File path: 
core/src/main/scala/org/apache/spark/deploy/master/ui/MasterWebUI.scala
##
@@ -49,6 +55,26 @@ class MasterWebUI(
   "/app/kill", "/", masterPage.handleAppKillRequest, httpMethods = 
Set("POST")))
 attachHandler(createRedirectHandler(
   "/driver/kill", "/", masterPage.handleDriverKillRequest, httpMethods = 
Set("POST")))
+attachHandler(createServletHandler("/workers/kill", new HttpServlet {
+  override def doPost(req: HttpServletRequest, resp: HttpServletResponse): 
Unit = {
+val hostnames: Seq[String] = Option(req.getParameterValues("host"))
+  .getOrElse(Array[String]()).toSeq
+if (!isDecommissioningRequestAllowed(req)) {
+  resp.sendError(HttpServletResponse.SC_METHOD_NOT_ALLOWED)
+} else {
+  val removedWorkers = 
masterEndpointRef.askSync[Integer](DecommissionHosts(hostnames))
+  logInfo(s"Decommissioning of hosts $hostnames decommissioned 
${removedWorkers} workers")

Review comment:
   nit: `${removedWorkers}` -> `$removedWorkers`

##
File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
##
@@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite
 }
   }
 
+  def testWorkerDecommissioning(
+  numWorkers: Int,
+  numWorkersExpectedToDecom: Int,
+  hostnames: Seq[String]): Unit = {
+val conf = new SparkConf()
+val master = makeAliveMaster(conf)
+val workerRegs = (1 to numWorkers).map{idx =>
+  val worker = new MockWorker(master.self, conf)
+  worker.rpcEnv.setupEndpoint("worker", worker)
+  val workerReg = RegisterWorker(
+worker.id,
+"localhost",
+worker.self.address.port,
+worker.self,
+10,
+1024,
+"http://localhost:8080";,
+RpcAddress("localhost", 1))
+  master.self.send(workerReg)
+  workerReg
+}
+
+eventually(timeout(10.seconds)) {
+  val masterState = 
master.self.askSync[MasterStateResponse](RequestMasterState)
+  assert(masterState.workers.length === numWorkers)
+  assert(masterState.workers.forall(_.state == WorkerState.ALIVE))
+  assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet)
+  masterState.workers

Review comment:
   nit: this is not needed

##
File path: core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala
##
@@ -726,6 +726,61 @@ class MasterSuite extends SparkFunSuite
 }
   }
 
+  def testWorkerDecommissioning(
+  numWorkers: Int,
+  numWorkersExpectedToDecom: Int,
+  hostnames: Seq[String]): Unit = {
+val conf = new SparkConf()
+val master = makeAliveMaster(conf)
+val workerRegs = (1 to numWorkers).map{idx =>
+  val worker = new MockWorker(master.self, conf)
+  worker.rpcEnv.setupEndpoint("worker", worker)
+  val workerReg = RegisterWorker(
+worker.id,
+"localhost",
+worker.self.address.port,
+worker.self,
+10,
+1024,
+"http://localhost:8080";,
+RpcAddress("localhost", 1))
+  master.self.send(workerReg)
+  workerReg
+}
+
+eventually(timeout(10.seconds)) {
+  val masterState = 
master.self.askSync[MasterStateResponse](RequestMasterState)
+  assert(masterState.workers.length === numWorkers)
+  assert(masterState.workers.forall(_.state == WorkerState.ALIVE))
+  assert(masterState.workers.map(_.id).toSet == workerRegs.map(_.id).toSet)
+  masterState.workers
+}
+
+val decomWorkersCount = 
master.self.askSync[Integer](DecommissionHosts(hostnames))
+assert(decomWorkersCount === numWorkersExpectedToDecom)
+
+// Decommissioning is actually async ... wait for the workers to actually 
be decommissioned by
+// polling the master's state.
+eventually(timeout(10.seconds)) {

Review comment:
   nit: we may want to give a longer timeout to avoid flakyness.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

2020-07-15 Thread GitBox


viirya commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455519563



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala
##
@@ -85,7 +85,6 @@ class FMClassifier @Since("3.0.0") (
*/
   @Since("3.0.0")
   def setFactorSize(value: Int): this.type = set(factorSize, value)
-  setDefault(factorSize -> 8)

Review comment:
   Where do the default params of `FMClassifier` move?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659168327







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29126: [SPARK-32324][SQL]Fix error messages during using PIVOT and lateral view

2020-07-15 Thread GitBox


SparkQA commented on pull request #29126:
URL: https://github.com/apache/spark/pull/29126#issuecomment-659168426


   **[Test build #125936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125936/testReport)**
 for PR 29126 at commit 
[`f943465`](https://github.com/apache/spark/commit/f943465514453ccc7c2ff23965d82baa687cdf9e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659118412


   **[Test build #125926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)**
 for PR 29032 at commit 
[`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29032: [SPARK-32217] Plumb whether a worker would also be decommissioned along with executor

2020-07-15 Thread GitBox


SparkQA commented on pull request #29032:
URL: https://github.com/apache/spark/pull/29032#issuecomment-659167771


   **[Test build #125926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125926/testReport)**
 for PR 29032 at commit 
[`cbd92ea`](https://github.com/apache/spark/commit/cbd92ea2529952dabe4cdb2d40d7b6d05bac399b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #29112: [SPARK-32310][ML][PySpark] ML params default value parity in classification, regression, clustering and fpm

2020-07-15 Thread GitBox


zhengruifeng commented on a change in pull request #29112:
URL: https://github.com/apache/spark/pull/29112#discussion_r455517498



##
File path: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala
##
@@ -68,6 +68,12 @@ private[clustering] trait BisectingKMeansParams extends 
Params with HasMaxIter
 "The minimum number of points (if >= 1.0) or the minimum proportion " +
   "of points (if < 1.0) of a divisible cluster.", ParamValidators.gt(0.0))
 
+
+  setDefault(

Review comment:
   total nit: make these params in single line, like above places





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29095: [SPARK-32298][ML] tree models prediction optimization

2020-07-15 Thread GitBox


zhengruifeng commented on pull request #29095:
URL: https://github.com/apache/spark/pull/29095#issuecomment-659165588


   friendly ping @huaxingao @srowen @viirya 
   
   Different another attempt to save RAM, this should be a clear optimization. 
I found that those methods can not be marked `@tailrec`, so I use while-loop 
instead.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659162582







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162307


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125912/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


SparkQA commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659162409


   **[Test build #125935 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125935/testReport)**
 for PR 29015 at commit 
[`3ee87f3`](https://github.com/apache/spark/commit/3ee87f376fe499df8aa710863f5bf6d9648f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659162291







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


SparkQA removed a comment on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659048554


   **[Test build #125912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)**
 for PR 27366 at commit 
[`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-15 Thread GitBox


SparkQA commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-659161699


   **[Test build #125912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125912/testReport)**
 for PR 27366 at commit 
[`fc725bc`](https://github.com/apache/spark/commit/fc725bc8def91f175f84eb1244386cd9d6f52fca).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659160481







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 edited a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spa

2020-07-15 Thread GitBox


venkata91 edited a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-658977388


   > you can make a common function that has most of the code that gets called 
from 2 separate tests. one test passes with dynamic allocation on, the other 
with it off. that will reduce code duplication.
   
   nevermind, I made some changes to the test so that it tests the dynamic 
allocation block of code properly.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-15 Thread GitBox


HyukjinKwon commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-659159126


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659158284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659158107


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29129:
URL: https://github.com/apache/spark/pull/29129#issuecomment-659157810


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual opened a new pull request #29129: [SPARK-31831] [SQL] [TESTS] put mocks in hive version subdirectory

2020-07-15 Thread GitBox


frankyin-factual opened a new pull request #29129:
URL: https://github.com/apache/spark/pull/29129


   
   
   ### What changes were proposed in this pull request?
   
   put version dependent hive mocks into its own subdirectories. 
   
   ### Why are the changes needed?
   
   Fix broken hive builds
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   This is a fix for tests. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659157488


   @HeartSaVioR @dongjoon-hyun https://github.com/apache/spark/pull/29129



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


agrawaldevesh commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659156968


   Retest this please.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-654385667


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29014: [SPARK-32199][SPARK-32198] Reduce job failures during decommissioning

2020-07-15 Thread GitBox


cloud-fan commented on pull request #29014:
URL: https://github.com/apache/spark/pull/29014#issuecomment-659156705


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-15 Thread GitBox


agrawaldevesh commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r455424422



##
File path: 
core/src/main/scala/org/apache/spark/network/netty/NettyBlockTransferService.scala
##
@@ -168,7 +168,10 @@ private[spark] class NettyBlockTransferService(
 // Everything else is encoded using our binary protocol.
 val metadata = 
JavaUtils.bufferToArray(serializer.newInstance().serialize((level, classTag)))
 
-val asStream = blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM)
+// We always transfer shuffle blocks as a stream for simplicity with the 
receiving code since
+// they are always written to disk. Otherwise we check the block size.
+val asStream = (blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) ||

Review comment:
   nit: A parentheses isn't quite needed, but even if it is, then would it 
be easier to read this as:
   
   val asStream = (blockData.size() > 
conf.get(config.MAX_REMOTE_BLOCK_SIZE_FETCH_TO_MEM) || blockId.isShuffle

##
File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala
##
@@ -38,7 +38,10 @@ sealed abstract class BlockId {
   // convenience methods
   def asRDDId: Option[RDDBlockId] = if (isRDD) Some(asInstanceOf[RDDBlockId]) 
else None
   def isRDD: Boolean = isInstanceOf[RDDBlockId]
-  def isShuffle: Boolean = isInstanceOf[ShuffleBlockId] || 
isInstanceOf[ShuffleBlockBatchId]
+  def isShuffle: Boolean = {
+(isInstanceOf[ShuffleBlockId] || isInstanceOf[ShuffleBlockBatchId] ||

Review comment:
   nit: Are the parentheses needed ?

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala
##
@@ -0,0 +1,330 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.storage
+
+import java.util.concurrent.ExecutorService
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.util.control.NonFatal
+
+import org.apache.spark._
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config
+import org.apache.spark.shuffle.{MigratableResolver, ShuffleBlockInfo}
+import org.apache.spark.storage.BlockManagerMessages.ReplicateBlock
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Class to handle block manager decommissioning retries.
+ * It creates a Thread to retry offloading all RDD cache and Shuffle blocks
+ */
+private[storage] class BlockManagerDecommissioner(
+  conf: SparkConf,
+  bm: BlockManager) extends Logging {
+
+  private val maxReplicationFailuresForDecommission =
+conf.get(config.STORAGE_DECOMMISSION_MAX_REPLICATION_FAILURE_PER_BLOCK)
+
+  /**
+   * This runnable consumes any shuffle blocks in the queue for migration. 
This part of a
+   * producer/consumer where the main migration loop updates the queue of 
blocks to be migrated
+   * periodically. On migration failure, the current thread will reinsert the 
block for another
+   * thread to consume. Each thread migrates blocks to a different particular 
executor to avoid
+   * distribute the blocks as quickly as possible without overwhelming any 
particular executor.
+   *
+   * There is no preference for which peer a given block is migrated to.
+   * This is notable different than the RDD cache block migration (further 
down in this file)
+   * which uses the existing priority mechanism for determining where to 
replicate blocks to.
+   * Generally speaking cache blocks are less impactful as they normally 
represent narrow
+   * transformations and we normally have less cache present than shuffle data.
+   *
+   * The producer/consumer model is chosen for shuffle block migration to 
maximize
+   * the chance of migrating all shuffle blocks before the executor is forced 
to exit.
+   */
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(ShuffleBlockInfo, Int)] = None
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+w

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659156074


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29128: [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29128:
URL: https://github.com/apache/spark/pull/29128#issuecomment-659155710


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on pull request #29015: [SPARK-32215] Expose a (protected) /workers/kill endpoint on the MasterWebUI

2020-07-15 Thread GitBox


agrawaldevesh commented on pull request #29015:
URL: https://github.com/apache/spark/pull/29015#issuecomment-659155613


   jenkins retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] williamhyun opened a new pull request #29128: [SPARK-XXX][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES

2020-07-15 Thread GitBox


williamhyun opened a new pull request #29128:
URL: https://github.com/apache/spark/pull/29128


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] frankyin-factual commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


frankyin-factual commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659154829


   I am working on a combination of 1) and 2). Will push shortly. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28676:
URL: https://github.com/apache/spark/pull/28676#issuecomment-659154211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-15 Thread GitBox


imback82 commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r455505839



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -60,6 +62,67 @@ case class BroadcastHashJoinExec(
 }
   }
 
+  override lazy val outputPartitioning: Partitioning = {
+joinType match {
+  case _: InnerLike =>
+streamedPlan.outputPartitioning match {
+  case h: HashPartitioning => expandOutputPartitioning(h)
+  case c: PartitioningCollection => expandOutputPartitioning(c)
+  case other => other
+}
+  case _ => streamedPlan.outputPartitioning
+}
+  }
+
+  // An one-to-many mapping from a streamed key to build keys.
+  private lazy val streamedKeyToBuildKeyMapping = {
+val mapping = mutable.Map.empty[Expression, Seq[Expression]]
+streamedKeys.zip(buildKeys).foreach {
+  case (streamedKey, buildKey) =>
+val key = streamedKey.canonicalized
+mapping.get(key) match {
+  case Some(v) => mapping.put(key, v :+ buildKey)
+  case None => mapping.put(key, Seq(buildKey))
+}
+}
+mapping.toMap
+  }
+
+  // Expands the given partitioning collection recursively.
+  private def expandOutputPartitioning(
+  partitioning: PartitioningCollection): PartitioningCollection = {
+PartitioningCollection(partitioning.partitionings.flatMap {
+  case h: HashPartitioning => expandOutputPartitioning(h).partitionings
+  case c: PartitioningCollection => Seq(expandOutputPartitioning(c))
+  case other => Seq(other)
+})
+  }
+
+  // Expands the given hash partitioning by substituting streamed keys with 
build keys.
+  // For example, if the expressions for the given partitioning are Seq("a", 
"b", "c")
+  // where the streamed keys are Seq("b", "c") and the build keys are Seq("x", 
"y"),
+  // the expanded partitioning will have the following expressions:
+  // Seq("a", "b", "c"), Seq("a", "b", "y"), Seq("a", "x", "c"), Seq("a", "x", 
"y").
+  // The expanded expressions are returned as PartitioningCollection.
+  private def expandOutputPartitioning(partitioning: HashPartitioning): 
PartitioningCollection = {
+def generateExprCombinations(
+current: Seq[Expression],
+accumulated: Seq[Expression]): Seq[Seq[Expression]] = {
+  if (current.isEmpty) {
+Seq(accumulated)
+  } else {
+val buildKeys = 
streamedKeyToBuildKeyMapping.get(current.head.canonicalized)
+generateExprCombinations(current.tail, accumulated :+ current.head) ++
+  buildKeys.map { _.flatMap(b => 
generateExprCombinations(current.tail, accumulated :+ b))

Review comment:
   I added a config to limit the expansion. (Please let me know if 
introducing a new config knob is too much. Then I can just have a constant in 
this class - not configurable).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29069: [SPARK-31831][SQL][TESTS] Use subclasses for mock in HiveSessionImplSuite

2020-07-15 Thread GitBox


dongjoon-hyun commented on pull request #29069:
URL: https://github.com/apache/spark/pull/29069#issuecomment-659151759


   Thank you. I'm fine for all combination (including Hive 2.3 only testing). 
Please feel free to choose an option. From my side, this also looks not urgent 
since this is not blocking both GitHub Action and PRBuilder. It has been broken 
over 3 days already. I hope `Hive 1.2` is going to be removed in the near 
future eventually after we build a consensus. Sooner is better.
   
   In short, please proceed toward what you think is right, @HeartSaVioR .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29101: [SPARK-32302][SQL] Partially push down disjunctive predicates through Join/Partitions

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29101:
URL: https://github.com/apache/spark/pull/29101#issuecomment-659150287







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue

2020-07-15 Thread GitBox


HeartSaVioR commented on pull request #28904:
URL: https://github.com/apache/spark/pull/28904#issuecomment-659149320


   Technically it's a private API, even not tagged as developer API - that 
said, it doesn't break anything in Spark's perspective. If we have confusion 
with availability of `org.apache.spark.sql.execution` package outside of Spark, 
then I'd rather say we may need to reconsider adding `private[execution]` on 
everywhere in the package.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148768


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125918/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-15 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-659148901


   **[Test build #125934 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125934/testReport)**
 for PR 27694 at commit 
[`2559928`](https://github.com/apache/spark/commit/2559928be2d7981c2c1c2d9b6111c4449e721310).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due t

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark'

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-659148755







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-15 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-659148369







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >