date:20200509

[GitHub] [spark] SparkQA removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626276383


   **[Test build #122471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)**
 for PR 28483 at commit 
[`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626278307







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626278307







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



SparkQA commented on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626278225


   **[Test build #122471 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)**
 for PR 28483 at commit 
[`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28444:
URL: https://github.com/apache/spark/pull/28444#discussion_r422588340



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -35,8 +36,24 @@ private[spark] class AppStatusStore(
 val store: KVStore,
 val listener: Option[AppStatusListener] = None) {
 
+  /**
+   * This method contains an automatic retry logic and tries to get a valid 
[[v1.ApplicationInfo]].

Review comment:
   `ApplicationInfo` seems a proper class which can properly converted into 
Java class via `Unidoc` in principle. If it works, let's uses `[[...]]`. If it 
fails to generate the documentation, let's stick to `` `...` ``.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28444:
URL: https://github.com/apache/spark/pull/28444#discussion_r422589241



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -35,8 +36,24 @@ private[spark] class AppStatusStore(
 val store: KVStore,
 val listener: Option[AppStatusListener] = None) {
 
+  /**
+   * This method contains an automatic retry logic and tries to get a valid 
[[v1.ApplicationInfo]].
+   * See [SPARK-31632] The ApplicationInfo in KVStore may be accessed before 
it's prepared
+   */
   def applicationInfo(): v1.ApplicationInfo = {

Review comment:
   I think capturing the exception outside is actually a good compromise 
considering the issue is rather a corner case. Let's scope narrow here since 
the issue is rather minor, and consider a better fix with a standard approach 
when any bigger issue is found in the design.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626276459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28444:
URL: https://github.com/apache/spark/pull/28444#issuecomment-626276473







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28444:
URL: https://github.com/apache/spark/pull/28444#issuecomment-626276473







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28444:
URL: https://github.com/apache/spark/pull/28444#discussion_r422588491



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -35,8 +36,24 @@ private[spark] class AppStatusStore(
 val store: KVStore,
 val listener: Option[AppStatusListener] = None) {
 
+  /**
+   * This method contains an automatic retry logic and tries to get a valid 
[[v1.ApplicationInfo]].

Review comment:
   We should use `` `...` `` in case the classes are a trait or a type 
alias which doesn't generate a canonical Java class. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626276459







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



SparkQA commented on pull request #28444:
URL: https://github.com/apache/spark/pull/28444#issuecomment-626276384


   **[Test build #122472 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122472/testReport)**
 for PR 28444 at commit 
[`0951f3d`](https://github.com/apache/spark/commit/0951f3d3eda712688d7ded8ca8da0db85fde3c4b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



SparkQA commented on pull request #28483:
URL: https://github.com/apache/spark/pull/28483#issuecomment-626276383


   **[Test build #122471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)**
 for PR 28483 at commit 
[`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28444:
URL: https://github.com/apache/spark/pull/28444#issuecomment-623194443


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28444:
URL: https://github.com/apache/spark/pull/28444#discussion_r422588340



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -35,8 +36,24 @@ private[spark] class AppStatusStore(
 val store: KVStore,
 val listener: Option[AppStatusListener] = None) {
 
+  /**
+   * This method contains an automatic retry logic and tries to get a valid 
[[v1.ApplicationInfo]].

Review comment:
   `ApplicationInfo` seems a proper class which can properly converted into 
Java class via `Unidoc` in principal. If it works, let's uses `[[...]]`. If it 
fails to generate the documentation, let's stick to `` `...` ``.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed

2020-05-09 Thread GitBox



HyukjinKwon commented on pull request #28444:
URL: https://github.com/apache/spark/pull/28444#issuecomment-626276260


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



huaxingao commented on a change in pull request #28483:
URL: https://github.com/apache/spark/pull/28483#discussion_r422587657



##
File path: python/pyspark/ml/stat.py
##
@@ -37,8 +37,8 @@ class ChiSquareTest(object):
 
 """
 @staticmethod
-@since("2.2.0")
-def test(dataset, featuresCol, labelCol):
+@since("3.1.0")

Review comment:
   @HyukjinKwon Thanks for your comment. I changed back the version and 
also added  ```versionchanged``` directive. ```versionadded``` is for ```class 
ChiSquareTest```, not for method ```test```, so I guess I will leave it as is?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626272714







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626272714







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



SparkQA commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626272623


   **[Test build #122470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122470/testReport)**
 for PR 28488 at commit 
[`72fc400`](https://github.com/apache/spark/commit/72fc400d40fbdc73a3e8285e06d7ae6b9e547cae).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626253365


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



HyukjinKwon commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626272338


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28479: [SPARK-31662][SQL] Fix loading of dates before 1582-10-15 from dictionary encoded Parquet columns

2020-05-09 Thread GitBox



HyukjinKwon commented on pull request #28479:
URL: https://github.com/apache/spark/pull/28479#issuecomment-626271994


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626270523







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626270523







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



SparkQA commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626270413


   **[Test build #122469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122469/testReport)**
 for PR 28485 at commit 
[`df61b7c`](https://github.com/apache/spark/commit/df61b7c61dbdbd977bec76cf22a5f269b418035b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626270260


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122468/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626270256


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626262235


   **[Test build #122468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)**
 for PR 28482 at commit 
[`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626270256







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



SparkQA commented on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626270160


   **[Test build #122468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)**
 for PR 28482 at commit 
[`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626171894


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



HyukjinKwon commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626270010







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



HyukjinKwon edited a comment on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626270022


   cc @gengliangwang and @HeartSaVioR as well



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28484: [SPARK-30168] [SQL]Changing deprecated api s used in parquet to minimise warning

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28484:
URL: https://github.com/apache/spark/pull/28484#discussion_r422580859



##
File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
##
@@ -144,13 +113,16 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
 String sparkRequestedSchemaString =
 
configuration.get(ParquetReadSupport$.MODULE$.SPARK_ROW_REQUESTED_SCHEMA());
 this.sparkSchema = 
StructType$.MODULE$.fromString(sparkRequestedSchemaString);
-this.reader = new ParquetFileReader(
-configuration, footer.getFileMetaData(), file, blocks, 
requestedSchema.getColumns());
+this.reader =  new ParquetFileReader(HadoopInputFile.fromPath(file, 
configuration),
+HadoopReadOptions.builder(configuration).build());

Review comment:
   You can't use this because of the leak issue at 
https://github.com/apache/parquet-mr/pull/510. This was fixed in Parquet 1.11.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28483:
URL: https://github.com/apache/spark/pull/28483#discussion_r422580476



##
File path: python/pyspark/ml/stat.py
##
@@ -37,8 +37,8 @@ class ChiSquareTest(object):
 
 """
 @staticmethod
-@since("2.2.0")
-def test(dataset, featuresCol, labelCol):
+@since("3.1.0")

Review comment:
   @huaxingao, let's don't change when the version was added. Also, 
`@since` adds `versionadded` automatically into the docstring so I think we can 
remove it in the docstring.
   
   In this case, we could add `verstionchanged` directive instead to describe 
the difference happened in Spark 3.1.0.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



HyukjinKwon commented on a change in pull request #28486:
URL: https://github.com/apache/spark/pull/28486#discussion_r422580243



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala
##
@@ -172,7 +173,8 @@ object RandomDataGenerator {
   // January 1, 1970, 00:00:00 GMT for "-12-31 
23:59:59.99".
   milliseconds = rand.nextLong() % 25340232959L
 }
-DateTimeUtils.toJavaDate((milliseconds / MILLIS_PER_DAY).toInt)
+val date = DateTimeUtils.toJavaDate((milliseconds / 
MILLIS_PER_DAY).toInt)
+Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + 
MILLIS_PER_DAY))

Review comment:
   Shall we add a short comment that it adds one day in case the leap year 
is not matched?  Also, let's add some words that the dates should be both valid 
in hybrid Gregorian/Julian calendar and Proleptic Gregorian calendar. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626262293







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626262293







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732

2020-05-09 Thread GitBox



SparkQA commented on pull request #28482:
URL: https://github.com/apache/spark/pull/28482#issuecomment-626262235


   **[Test build #122468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)**
 for PR 28482 at commit 
[`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] oleg-smith commented on a change in pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



oleg-smith commented on a change in pull request #28488:
URL: https://github.com/apache/spark/pull/28488#discussion_r422572885



##
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##
@@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag](
 Array.concat(results: _*)
   }
 
+  def toLocalIterator : Iterator[T] = toLocalIterator(false)
+
   /**
* Return an iterator that contains all of the elements in this RDD.
*
* The iterator will consume as much memory as the largest partition in this 
RDD.
+   * With prefetch it may consume up to the memory of the 2 largest partitions.
+   *
+   * @param prefetchPartitions If Spark should pre-fetch the next partition 
before it is needed.
*
* @note This results in multiple Spark jobs, and if the input RDD is the 
result
* of a wide transformation (e.g. join with different partitioners), to avoid
* recomputing the input RDD should be cached first.
*/
-  def toLocalIterator: Iterator[T] = withScope {
-def collectPartition(p: Int): Array[T] = {
-  sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = 
withScope {
+
+if (!prefetchPartitions || partitions.indices.isEmpty) {
+  def collectPartition(p: Int): Array[T] = {
+sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  }
+  partitions.indices.iterator.flatMap(i => collectPartition(i))
+
+} else {
+
+  val iterator: Iterator[Array[T]] = prefetchingIterator
+  iterator.hasNext
+  iterator.flatMap(data => data)
+}
+  }
+
+  private def prefetchingIterator: Iterator[Array[T]] = {
+
+val partitionIterator = partitions.indices.iterator
+
+new Iterator[Array[T]] with Serializable {
+
+  private val lock = new ReentrantLock()
+  private val ready = lock.newCondition()
+
+  private var nextResult: Array[T] = _
+  private var fetchInProgress = false
+
+  /**
+   * In addition, it prefetches next element, if it exists
+   */
+  override def hasNext(): Boolean = withLock(() => {
+if (fetchInProgress) true

Review comment:
   You mean by data? Iterator content is flattened by flatMap() in main 
method





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on a change in pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



holdenk commented on a change in pull request #28488:
URL: https://github.com/apache/spark/pull/28488#discussion_r422572289



##
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##
@@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag](
 Array.concat(results: _*)
   }
 
+  def toLocalIterator : Iterator[T] = toLocalIterator(false)
+
   /**
* Return an iterator that contains all of the elements in this RDD.
*
* The iterator will consume as much memory as the largest partition in this 
RDD.
+   * With prefetch it may consume up to the memory of the 2 largest partitions.
+   *
+   * @param prefetchPartitions If Spark should pre-fetch the next partition 
before it is needed.
*
* @note This results in multiple Spark jobs, and if the input RDD is the 
result
* of a wide transformation (e.g. join with different partitioners), to avoid
* recomputing the input RDD should be cached first.
*/
-  def toLocalIterator: Iterator[T] = withScope {
-def collectPartition(p: Int): Array[T] = {
-  sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = 
withScope {
+
+if (!prefetchPartitions || partitions.indices.isEmpty) {
+  def collectPartition(p: Int): Array[T] = {
+sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  }
+  partitions.indices.iterator.flatMap(i => collectPartition(i))
+
+} else {
+
+  val iterator: Iterator[Array[T]] = prefetchingIterator
+  iterator.hasNext
+  iterator.flatMap(data => data)
+}
+  }
+
+  private def prefetchingIterator: Iterator[Array[T]] = {
+
+val partitionIterator = partitions.indices.iterator
+
+new Iterator[Array[T]] with Serializable {
+
+  private val lock = new ReentrantLock()
+  private val ready = lock.newCondition()
+
+  private var nextResult: Array[T] = _
+  private var fetchInProgress = false
+
+  /**
+   * In addition, it prefetches next element, if it exists
+   */
+  override def hasNext(): Boolean = withLock(() => {
+if (fetchInProgress) true

Review comment:
   What about if the next partition is empty?

##
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##
@@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag](
 Array.concat(results: _*)
   }
 
+  def toLocalIterator : Iterator[T] = toLocalIterator(false)
+
   /**
* Return an iterator that contains all of the elements in this RDD.
*
* The iterator will consume as much memory as the largest partition in this 
RDD.
+   * With prefetch it may consume up to the memory of the 2 largest partitions.
+   *
+   * @param prefetchPartitions If Spark should pre-fetch the next partition 
before it is needed.
*
* @note This results in multiple Spark jobs, and if the input RDD is the 
result
* of a wide transformation (e.g. join with different partitioners), to avoid
* recomputing the input RDD should be cached first.
*/
-  def toLocalIterator: Iterator[T] = withScope {
-def collectPartition(p: Int): Array[T] = {
-  sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = 
withScope {
+
+if (!prefetchPartitions || partitions.indices.isEmpty) {
+  def collectPartition(p: Int): Array[T] = {
+sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head
+  }
+  partitions.indices.iterator.flatMap(i => collectPartition(i))
+

Review comment:
   Minor style: we don't normally leave an empty line here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] oleg-smith commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



oleg-smith commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626258151


   @BryanCutler @HyukjinKwon @holdenk Could you review please?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626253290


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626253365


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28488:
URL: https://github.com/apache/spark/pull/28488#issuecomment-626253290


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] oleg-smith opened a new pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator

2020-05-09 Thread GitBox



oleg-smith opened a new pull request #28488:
URL: https://github.com/apache/spark/pull/28488


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



huaxingao commented on pull request #28487:
URL: https://github.com/apache/spark/pull/28487#issuecomment-626248052


   The change looks good except one minor comment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



huaxingao commented on a change in pull request #28487:
URL: https://github.com/apache/spark/pull/28487#discussion_r422558046



##
File path: 
mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala
##
@@ -261,4 +261,14 @@ class VectorAssemblerSuite
 val output = vectorAssembler.transform(dfWithNullsAndNaNs)
 assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, 
Seq.empty)))
   }
+  test("SPARK-31671: should give explicit error message when can not infer 
column lengths") {

Review comment:
   super nit: add a blank line between L263 and L264?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626245384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626245384







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626216233


   **[Test build #122467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)**
 for PR 28486 at commit 
[`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



SparkQA commented on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626245241


   **[Test build #122467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)**
 for PR 28486 at commit 
[`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-09 Thread GitBox



huaxingao commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-626230444


   @srowen this is for 3.0.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #28368: [SPARK-31575][SQL] Synchronise global JVM security configuration modification

2020-05-09 Thread GitBox



srowen commented on pull request #28368:
URL: https://github.com/apache/spark/pull/28368#issuecomment-626229811


   Does this cause enough of a real-world problem that it should be in 3.0 or 
2.4?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference

2020-05-09 Thread GitBox



srowen commented on pull request #28451:
URL: https://github.com/apache/spark/pull/28451#issuecomment-626229743


   If there are no more comments, I'll merge tomorrow. This is for 3.1 only?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28487:
URL: https://github.com/apache/spark/pull/28487#issuecomment-626226016


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28487:
URL: https://github.com/apache/spark/pull/28487#issuecomment-626226137


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28487:
URL: https://github.com/apache/spark/pull/28487#issuecomment-626226016


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fan31415 opened a new pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler

2020-05-09 Thread GitBox



fan31415 opened a new pull request #28487:
URL: https://github.com/apache/spark/pull/28487


### What changes were proposed in this pull request?
   When input column lengths can not be inferred and handleInvalid = "keep",  
VectorAssembler will throw a runtime exception. However the error message with 
this exception is not consistent. I change the content of this error message to 
make it work properly.
   
   
   ### Why are the changes needed?
   This is a bug. Here is a simple example to reproduce it.
   
   ```
   // create a df without vector size
   val df = Seq(
 (Vectors.dense(1.0), Vectors.dense(2.0))
   ).toDF("n1", "n2")
   
   // only set vector size hint for n1 column
   val hintedDf = new VectorSizeHint()
 .setInputCol("n1")
 .setSize(1)
 .transform(df)
   
   // assemble n1, n2
   val output = new VectorAssembler()
 .setInputCols(Array("n1", "n2"))
 .setOutputCol("features")
 .setHandleInvalid("keep")
 .transform(hintedDf)
   
   // because only n1 has vector size, the error message should tell us to set 
vector size for n2 too
   output.show()
   ```
   
   Expected error message:
   
   ```
   Can not infer column lengths with handleInvalid = "keep". Consider using 
VectorSizeHint to add metadata for columns: [n2].
   ```
   
   Actual error message:
   
   ```
   Can not infer column lengths with handleInvalid = "keep". Consider using 
VectorSizeHint to add metadata for columns: [n1, n2].
   ```
   
   This introduce difficulties when I try to resolve this exception, for I do 
not know which column required vectorSizeHint. This is especially troublesome 
when you have a large number of columns to deal with.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Add test in VectorAssemblerSuite.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626216398







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626216398







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-09 Thread GitBox



igreenfield commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r422528642



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -157,23 +259,23 @@ private[spark] object ThreadUtils {
*/
   def newDaemonFixedThreadPool(nThreads: Int, prefix: String): 
ThreadPoolExecutor = {
 val threadFactory = namedThreadFactory(prefix)
-Executors.newFixedThreadPool(nThreads, 
threadFactory).asInstanceOf[ThreadPoolExecutor]
+MDCAwareThreadPoolExecutor.newFixedThreadPool(nThreads, threadFactory)

Review comment:
   I check and all uses of that function are from driver code. 
   What do you think about reverting that one and create a new one called 
`newMDCAwareDaemonFixedThreadPool` so it can be used later?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor

2020-05-09 Thread GitBox



igreenfield commented on a change in pull request #26624:
URL: https://github.com/apache/spark/pull/26624#discussion_r422528642



##
File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
##
@@ -157,23 +259,23 @@ private[spark] object ThreadUtils {
*/
   def newDaemonFixedThreadPool(nThreads: Int, prefix: String): 
ThreadPoolExecutor = {
 val threadFactory = namedThreadFactory(prefix)
-Executors.newFixedThreadPool(nThreads, 
threadFactory).asInstanceOf[ThreadPoolExecutor]
+MDCAwareThreadPoolExecutor.newFixedThreadPool(nThreads, threadFactory)

Review comment:
   @cloud-fan I check and all uses of that function are from driver code. 
   What do you think about reverting that one and create a new one called 
`newMDCAwareDaemonFixedThreadPool` so it can be used later?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



SparkQA commented on pull request #28486:
URL: https://github.com/apache/spark/pull/28486#issuecomment-626216233


   **[Test build #122467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)**
 for PR 28486 at commit 
[`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



MaxGekk commented on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626215784


   The PR https://github.com/apache/spark/pull/28486 fixes the build failure 
https://github.com/apache/spark/pull/28481#issuecomment-626034381



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk opened a new pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps

2020-05-09 Thread GitBox



MaxGekk opened a new pull request #28486:
URL: https://github.com/apache/spark/pull/28486


   ### What changes were proposed in this pull request?
   Shift non-existing dates in Proleptic Gregorian calendar by 1 day. The 
reason for that is `RowEncoderSuite` generates random dates/timestamps in the 
hybrid calendar, and some dates/timestamps don't exist in Proleptic Gregorian 
calendar like 1000-02-29 because 1000 is not leap year in Proleptic Gregorian 
calendar.
   
   ### Why are the changes needed?
   This makes RowEncoderSuite much stable.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   By running RowEncoderSuite and set non-existing date manually:
   ```scala
   val date = new java.sql.Date(1000 - 1900, 1, 29)
   Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + 
MILLIS_PER_DAY))
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422511601



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   @Ngone51 Yes, not sure about the logs from 
`StandaloneAppClient$ClientEndpoint`. I will check again. This is the command I 
am using to submit jobs:.`/bin/spark-submit --master spark://127.0.0.1:7077 
--conf spark.standalone.submit.waitAppCompletion=true --deploy-mode cluster 
--class org.apache.spark.examples.SparkPi 
examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422511601



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   @Ngone51 Yes, not sure about the logs. I will check again. This is the 
command I am using to submit jobs:.`/bin/spark-submit --master 
spark://127.0.0.1:7077 --conf spark.standalone.submit.waitAppCompletion=true 
--deploy-mode cluster --class org.apache.spark.examples.SparkPi 
examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



Ngone51 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422503865



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   > In both cases, I see the following in driver logs. I couldn't find any 
difference in logs.
   
   Hi @akshatb1 , logs are from  `StandaloneAppClient$ClientEndpoint` and 
`StandaloneSchedulerBackend` rather than 
`org.apache.spark.deploy.ClientEndpoint`. Can you check again?
   
   > Just to confirm, are you suggesting to do this in lin # 180 in 
pollAndReportStatus method? Or should we handle this outside?
   
   I think just after line 180 should be ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



Ngone51 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422503865



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   > In both cases, I see the following in driver logs. I couldn't find any 
difference in logs.
   
   Hi @akshatb1 , logs are from  `StandaloneAppClient$ClientEndpoint` and 
`StandaloneSchedulerBackend` rather than 
`org.apache.spark.deploy.ClientEndpoint`. Can you check again?
   
   > Just to confirm, are you suggesting to do this in lin # 180 in 
pollAndReportStatus method? Or should we handle this outside?
   
   I think just after line #180 should be ok.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422502868



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   > We can periodically send a message (e.g. we can send it after 
`Thread.sleep(REPORT_DRIVER_STATUS_INTERVAL)` ) to `ClientEndpoint` itself to 
check driver's status.
   
   @Ngone51 Thanks for this suggestion. Just to confirm, are you suggesting to 
do this in line # 180 in pollAndReportStatus method? Or should we handle this 
outside?
   
![image](https://user-images.githubusercontent.com/31816865/81476612-73605500-9230-11ea-83a3-937782cbe00f.png)
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422502868



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   > We can periodically send a message (e.g. we can send it after 
`Thread.sleep(REPORT_DRIVER_STATUS_INTERVAL)` ) to `ClientEndpoint` itself to 
check driver's status.
   
   @Ngone51 Thanks for this suggestion. Just to confirm, are you suggesting to 
do this in lin # 180 in pollAndReportStatus method? Or should we handle this 
outside?
   
![image](https://user-images.githubusercontent.com/31816865/81476588-4449e380-9230-11ea-9704-713b84daf2d4.png)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626180429







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626180429







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626136760


   **[Test build #122465 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122465/testReport)**
 for PR 28481 at commit 
[`1150983`](https://github.com/apache/spark/commit/1150983e02e6e5ca390dfcfeaf203070e953ed03).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



SparkQA commented on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626180098


   **[Test build #122465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122465/testReport)**
 for PR 28481 at commit 
[`1150983`](https://github.com/apache/spark/commit/1150983e02e6e5ca390dfcfeaf203070e953ed03).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626172999







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626172999







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



SparkQA commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626172867


   **[Test build #122466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)**
 for PR 27473 at commit 
[`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626164399


   **[Test build #122466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)**
 for PR 27473 at commit 
[`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626171759


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626171894


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626171759


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] iRakson commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



iRakson commented on pull request #28485:
URL: https://github.com/apache/spark/pull/28485#issuecomment-626171426


   cc @srowen @zsxwing @sarutak @uncleGen Kindly Review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] iRakson opened a new pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page

2020-05-09 Thread GitBox



iRakson opened a new pull request #28485:
URL: https://github.com/apache/spark/pull/28485


   
   
   ### What changes were proposed in this pull request?
   Add Pagination Support for structured streaming page. Now both tables 
`Active Queries` and `Completed Queries` will have pagination. 
   To implement pagination, pagination framework from #7399  is used.
   * Also tables will only be shown if there is at least one entry in the table.
   
   
   
   
   ### Why are the changes needed?
   * This will help users in analysing their structured streaming queries in 
much better way.
   * Other Web UI pages support pagination in their table. So this will make 
web UI more consistent across pages.
   * This can prevent potential OOM errors.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. Both tables will support pagination.
   
   
   
   ### How was this patch tested?
   Manually. I will add snapshots soon.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626168671







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626168671







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626126204


   **[Test build #122464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122464/testReport)**
 for PR 28391 at commit 
[`c3db6cf`](https://github.com/apache/spark/commit/c3db6cfaaa2260e7f11436e84fc2576ad4f8dde1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



SparkQA commented on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626168470


   **[Test build #122464 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122464/testReport)**
 for PR 28391 at commit 
[`c3db6cf`](https://github.com/apache/spark/commit/c3db6cfaaa2260e7f11436e84fc2576ad4f8dde1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



SparkQA commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626164399


   **[Test build #122466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)**
 for PR 27473 at commit 
[`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626163688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626163493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #27473:
URL: https://github.com/apache/spark/pull/27473#issuecomment-626163688







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626163493







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



SparkQA removed a comment on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626121859


   **[Test build #122462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122462/testReport)**
 for PR 28391 at commit 
[`914188c`](https://github.com/apache/spark/commit/914188c83642341e459ff828a4d3a4ae4c1be224).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update

2020-05-09 Thread GitBox



SparkQA commented on pull request #28391:
URL: https://github.com/apache/spark/pull/28391#issuecomment-626163191


   **[Test build #122462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122462/testReport)**
 for PR 28391 at commit 
[`914188c`](https://github.com/apache/spark/commit/914188c83642341e459ff828a4d3a4ae4c1be224).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode

2020-05-09 Thread GitBox



akshatb1 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r422469087



##
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##
@@ -124,38 +127,57 @@ private class ClientEndpoint(
 }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
 // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
 // is fine.
 logInfo("... waiting before polling master for driver state")
 Thread.sleep(5000)
 logInfo("... polling master for driver state")
-val statusResponse =
-  
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-if (statusResponse.found) {
-  logInfo(s"State of $driverId is ${statusResponse.state.get}")
-  // Worker node, if present
-  (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-  logInfo(s"Driver running on $hostPort ($id)")
-case _ =>
-  }
-  // Exception, if present
-  statusResponse.exception match {
-case Some(e) =>
-  logError(s"Exception from cluster was: $e")
-  e.printStackTrace()
-  System.exit(-1)
-case _ =>
-  System.exit(0)
+while (true) {

Review comment:
   @Ngone51 I launched a long-running application with flag enabled and 
disabled and stopped the Spark Master in middle. In both cases, I see the 
following in driver logs. I couldn't find any difference in logs.
   ```
   20/05/09 13:42:59 WARN StandaloneAppClient$ClientEndpoint: Connection to 
Akshats-MacBook-Pro.local:7077 failed; waiting for master to reconnect...
   20/05/09 13:42:59 WARN StandaloneSchedulerBackend: Disconnected from Spark 
cluster! Waiting for reconnection...
   ```
   
   `onDisconnected` method from `StandaloneAppClient.scala` is getting called:
   
![image](https://user-images.githubusercontent.com/31816865/81468361-bb658480-91fc-11ea-87d5-3d00cbf7619f.png)
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins removed a comment on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626136957







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps

2020-05-09 Thread GitBox



AmplabJenkins commented on pull request #28481:
URL: https://github.com/apache/spark/pull/28481#issuecomment-626136957







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 133 matches

Mail list logo