[GitHub] [spark] SparkQA removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
SparkQA removed a comment on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626276383 **[Test build #122471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)** for PR 28483 at commit [`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
AmplabJenkins removed a comment on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626278307 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
AmplabJenkins commented on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626278307 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
SparkQA commented on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626278225 **[Test build #122471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)** for PR 28483 at commit [`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
HyukjinKwon commented on a change in pull request #28444: URL: https://github.com/apache/spark/pull/28444#discussion_r422588340 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -35,8 +36,24 @@ private[spark] class AppStatusStore( val store: KVStore, val listener: Option[AppStatusListener] = None) { + /** + * This method contains an automatic retry logic and tries to get a valid [[v1.ApplicationInfo]]. Review comment: `ApplicationInfo` seems a proper class which can properly converted into Java class via `Unidoc` in principle. If it works, let's uses `[[...]]`. If it fails to generate the documentation, let's stick to `` `...` ``. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
HyukjinKwon commented on a change in pull request #28444: URL: https://github.com/apache/spark/pull/28444#discussion_r422589241 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -35,8 +36,24 @@ private[spark] class AppStatusStore( val store: KVStore, val listener: Option[AppStatusListener] = None) { + /** + * This method contains an automatic retry logic and tries to get a valid [[v1.ApplicationInfo]]. + * See [SPARK-31632] The ApplicationInfo in KVStore may be accessed before it's prepared + */ def applicationInfo(): v1.ApplicationInfo = { Review comment: I think capturing the exception outside is actually a good compromise considering the issue is rather a corner case. Let's scope narrow here since the issue is rather minor, and consider a better fix with a standard approach when any bigger issue is found in the design. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
AmplabJenkins removed a comment on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626276459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
AmplabJenkins removed a comment on pull request #28444: URL: https://github.com/apache/spark/pull/28444#issuecomment-626276473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
AmplabJenkins commented on pull request #28444: URL: https://github.com/apache/spark/pull/28444#issuecomment-626276473 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
HyukjinKwon commented on a change in pull request #28444: URL: https://github.com/apache/spark/pull/28444#discussion_r422588491 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -35,8 +36,24 @@ private[spark] class AppStatusStore( val store: KVStore, val listener: Option[AppStatusListener] = None) { + /** + * This method contains an automatic retry logic and tries to get a valid [[v1.ApplicationInfo]]. Review comment: We should use `` `...` `` in case the classes are a trait or a type alias which doesn't generate a canonical Java class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
AmplabJenkins commented on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626276459 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
SparkQA commented on pull request #28444: URL: https://github.com/apache/spark/pull/28444#issuecomment-626276384 **[Test build #122472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122472/testReport)** for PR 28444 at commit [`0951f3d`](https://github.com/apache/spark/commit/0951f3d3eda712688d7ded8ca8da0db85fde3c4b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
SparkQA commented on pull request #28483: URL: https://github.com/apache/spark/pull/28483#issuecomment-626276383 **[Test build #122471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122471/testReport)** for PR 28483 at commit [`27aca7a`](https://github.com/apache/spark/commit/27aca7afe907a6978b4079f033d61acbdb5575b4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
AmplabJenkins removed a comment on pull request #28444: URL: https://github.com/apache/spark/pull/28444#issuecomment-623194443 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
HyukjinKwon commented on a change in pull request #28444: URL: https://github.com/apache/spark/pull/28444#discussion_r422588340 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -35,8 +36,24 @@ private[spark] class AppStatusStore( val store: KVStore, val listener: Option[AppStatusListener] = None) { + /** + * This method contains an automatic retry logic and tries to get a valid [[v1.ApplicationInfo]]. Review comment: `ApplicationInfo` seems a proper class which can properly converted into Java class via `Unidoc` in principal. If it works, let's uses `[[...]]`. If it fails to generate the documentation, let's stick to `` `...` ``. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28444: [SPARK-31632][CORE][WEBUI] Make the ApplicationInfo always available when accessed
HyukjinKwon commented on pull request #28444: URL: https://github.com/apache/spark/pull/28444#issuecomment-626276260 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
huaxingao commented on a change in pull request #28483: URL: https://github.com/apache/spark/pull/28483#discussion_r422587657 ## File path: python/pyspark/ml/stat.py ## @@ -37,8 +37,8 @@ class ChiSquareTest(object): """ @staticmethod -@since("2.2.0") -def test(dataset, featuresCol, labelCol): +@since("3.1.0") Review comment: @HyukjinKwon Thanks for your comment. I changed back the version and also added ```versionchanged``` directive. ```versionadded``` is for ```class ChiSquareTest```, not for method ```test```, so I guess I will leave it as is? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins removed a comment on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626272714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626272714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
SparkQA commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626272623 **[Test build #122470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122470/testReport)** for PR 28488 at commit [`72fc400`](https://github.com/apache/spark/commit/72fc400d40fbdc73a3e8285e06d7ae6b9e547cae). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins removed a comment on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626253365 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
HyukjinKwon commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626272338 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28479: [SPARK-31662][SQL] Fix loading of dates before 1582-10-15 from dictionary encoded Parquet columns
HyukjinKwon commented on pull request #28479: URL: https://github.com/apache/spark/pull/28479#issuecomment-626271994 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626270523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins removed a comment on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626270523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
SparkQA commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626270413 **[Test build #122469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122469/testReport)** for PR 28485 at commit [`df61b7c`](https://github.com/apache/spark/commit/df61b7c61dbdbd977bec76cf22a5f269b418035b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
AmplabJenkins removed a comment on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626270260 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122468/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
AmplabJenkins removed a comment on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626270256 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
SparkQA removed a comment on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626262235 **[Test build #122468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)** for PR 28482 at commit [`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
AmplabJenkins commented on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626270256 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
SparkQA commented on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626270160 **[Test build #122468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)** for PR 28482 at commit [`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins removed a comment on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626171894 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
HyukjinKwon commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626270010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
HyukjinKwon edited a comment on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626270022 cc @gengliangwang and @HeartSaVioR as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28484: [SPARK-30168] [SQL]Changing deprecated api s used in parquet to minimise warning
HyukjinKwon commented on a change in pull request #28484: URL: https://github.com/apache/spark/pull/28484#discussion_r422580859 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java ## @@ -144,13 +113,16 @@ public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptCont String sparkRequestedSchemaString = configuration.get(ParquetReadSupport$.MODULE$.SPARK_ROW_REQUESTED_SCHEMA()); this.sparkSchema = StructType$.MODULE$.fromString(sparkRequestedSchemaString); -this.reader = new ParquetFileReader( -configuration, footer.getFileMetaData(), file, blocks, requestedSchema.getColumns()); +this.reader = new ParquetFileReader(HadoopInputFile.fromPath(file, configuration), +HadoopReadOptions.builder(configuration).build()); Review comment: You can't use this because of the leak issue at https://github.com/apache/parquet-mr/pull/510. This was fixed in Parquet 1.11. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28483: [SPARK-31667][ML][PySpark] Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
HyukjinKwon commented on a change in pull request #28483: URL: https://github.com/apache/spark/pull/28483#discussion_r422580476 ## File path: python/pyspark/ml/stat.py ## @@ -37,8 +37,8 @@ class ChiSquareTest(object): """ @staticmethod -@since("2.2.0") -def test(dataset, featuresCol, labelCol): +@since("3.1.0") Review comment: @huaxingao, let's don't change when the version was added. Also, `@since` adds `versionadded` automatically into the docstring so I think we can remove it in the docstring. In this case, we could add `verstionchanged` directive instead to describe the difference happened in Spark 3.1.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
HyukjinKwon commented on a change in pull request #28486: URL: https://github.com/apache/spark/pull/28486#discussion_r422580243 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala ## @@ -172,7 +173,8 @@ object RandomDataGenerator { // January 1, 1970, 00:00:00 GMT for "-12-31 23:59:59.99". milliseconds = rand.nextLong() % 25340232959L } -DateTimeUtils.toJavaDate((milliseconds / MILLIS_PER_DAY).toInt) +val date = DateTimeUtils.toJavaDate((milliseconds / MILLIS_PER_DAY).toInt) +Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + MILLIS_PER_DAY)) Review comment: Shall we add a short comment that it adds one day in case the leap year is not matched? Also, let's add some words that the dates should be both valid in hybrid Gregorian/Julian calendar and Proleptic Gregorian calendar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
AmplabJenkins removed a comment on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626262293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
AmplabJenkins commented on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626262293 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28482: [DONOTMERGE][DEBUG] Debug the test issues in SPARK-20732
SparkQA commented on pull request #28482: URL: https://github.com/apache/spark/pull/28482#issuecomment-626262235 **[Test build #122468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122468/testReport)** for PR 28482 at commit [`8a0eeba`](https://github.com/apache/spark/commit/8a0eeba9cb29548948fcca680bf0170fdaa440f2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] oleg-smith commented on a change in pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
oleg-smith commented on a change in pull request #28488: URL: https://github.com/apache/spark/pull/28488#discussion_r422572885 ## File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala ## @@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag]( Array.concat(results: _*) } + def toLocalIterator : Iterator[T] = toLocalIterator(false) + /** * Return an iterator that contains all of the elements in this RDD. * * The iterator will consume as much memory as the largest partition in this RDD. + * With prefetch it may consume up to the memory of the 2 largest partitions. + * + * @param prefetchPartitions If Spark should pre-fetch the next partition before it is needed. * * @note This results in multiple Spark jobs, and if the input RDD is the result * of a wide transformation (e.g. join with different partitioners), to avoid * recomputing the input RDD should be cached first. */ - def toLocalIterator: Iterator[T] = withScope { -def collectPartition(p: Int): Array[T] = { - sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = withScope { + +if (!prefetchPartitions || partitions.indices.isEmpty) { + def collectPartition(p: Int): Array[T] = { +sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + } + partitions.indices.iterator.flatMap(i => collectPartition(i)) + +} else { + + val iterator: Iterator[Array[T]] = prefetchingIterator + iterator.hasNext + iterator.flatMap(data => data) +} + } + + private def prefetchingIterator: Iterator[Array[T]] = { + +val partitionIterator = partitions.indices.iterator + +new Iterator[Array[T]] with Serializable { + + private val lock = new ReentrantLock() + private val ready = lock.newCondition() + + private var nextResult: Array[T] = _ + private var fetchInProgress = false + + /** + * In addition, it prefetches next element, if it exists + */ + override def hasNext(): Boolean = withLock(() => { +if (fetchInProgress) true Review comment: You mean by data? Iterator content is flattened by flatMap() in main method This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
holdenk commented on a change in pull request #28488: URL: https://github.com/apache/spark/pull/28488#discussion_r422572289 ## File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala ## @@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag]( Array.concat(results: _*) } + def toLocalIterator : Iterator[T] = toLocalIterator(false) + /** * Return an iterator that contains all of the elements in this RDD. * * The iterator will consume as much memory as the largest partition in this RDD. + * With prefetch it may consume up to the memory of the 2 largest partitions. + * + * @param prefetchPartitions If Spark should pre-fetch the next partition before it is needed. * * @note This results in multiple Spark jobs, and if the input RDD is the result * of a wide transformation (e.g. join with different partitioners), to avoid * recomputing the input RDD should be cached first. */ - def toLocalIterator: Iterator[T] = withScope { -def collectPartition(p: Int): Array[T] = { - sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = withScope { + +if (!prefetchPartitions || partitions.indices.isEmpty) { + def collectPartition(p: Int): Array[T] = { +sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + } + partitions.indices.iterator.flatMap(i => collectPartition(i)) + +} else { + + val iterator: Iterator[Array[T]] = prefetchingIterator + iterator.hasNext + iterator.flatMap(data => data) +} + } + + private def prefetchingIterator: Iterator[Array[T]] = { + +val partitionIterator = partitions.indices.iterator + +new Iterator[Array[T]] with Serializable { + + private val lock = new ReentrantLock() + private val ready = lock.newCondition() + + private var nextResult: Array[T] = _ + private var fetchInProgress = false + + /** + * In addition, it prefetches next element, if it exists + */ + override def hasNext(): Boolean = withLock(() => { +if (fetchInProgress) true Review comment: What about if the next partition is empty? ## File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala ## @@ -1031,20 +1032,101 @@ abstract class RDD[T: ClassTag]( Array.concat(results: _*) } + def toLocalIterator : Iterator[T] = toLocalIterator(false) + /** * Return an iterator that contains all of the elements in this RDD. * * The iterator will consume as much memory as the largest partition in this RDD. + * With prefetch it may consume up to the memory of the 2 largest partitions. + * + * @param prefetchPartitions If Spark should pre-fetch the next partition before it is needed. * * @note This results in multiple Spark jobs, and if the input RDD is the result * of a wide transformation (e.g. join with different partitioners), to avoid * recomputing the input RDD should be cached first. */ - def toLocalIterator: Iterator[T] = withScope { -def collectPartition(p: Int): Array[T] = { - sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + def toLocalIterator(prefetchPartitions: Boolean = false): Iterator[T] = withScope { + +if (!prefetchPartitions || partitions.indices.isEmpty) { + def collectPartition(p: Int): Array[T] = { +sc.runJob(this, (iter: Iterator[T]) => iter.toArray, Seq(p)).head + } + partitions.indices.iterator.flatMap(i => collectPartition(i)) + Review comment: Minor style: we don't normally leave an empty line here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] oleg-smith commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
oleg-smith commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626258151 @BryanCutler @HyukjinKwon @holdenk Could you review please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins removed a comment on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626253290 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626253365 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
AmplabJenkins commented on pull request #28488: URL: https://github.com/apache/spark/pull/28488#issuecomment-626253290 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] oleg-smith opened a new pull request #28488: SPARK-29083 Prefetch elements in rdd.toLocalIterator
oleg-smith opened a new pull request #28488: URL: https://github.com/apache/spark/pull/28488 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
huaxingao commented on pull request #28487: URL: https://github.com/apache/spark/pull/28487#issuecomment-626248052 The change looks good except one minor comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
huaxingao commented on a change in pull request #28487: URL: https://github.com/apache/spark/pull/28487#discussion_r422558046 ## File path: mllib/src/test/scala/org/apache/spark/ml/feature/VectorAssemblerSuite.scala ## @@ -261,4 +261,14 @@ class VectorAssemblerSuite val output = vectorAssembler.transform(dfWithNullsAndNaNs) assert(output.select("a").limit(1).collect().head == Row(Vectors.sparse(0, Seq.empty))) } + test("SPARK-31671: should give explicit error message when can not infer column lengths") { Review comment: super nit: add a blank line between L263 and L264? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
AmplabJenkins commented on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626245384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
AmplabJenkins removed a comment on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626245384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
SparkQA removed a comment on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626216233 **[Test build #122467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)** for PR 28486 at commit [`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
SparkQA commented on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626245241 **[Test build #122467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)** for PR 28486 at commit [`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
huaxingao commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-626230444 @srowen this is for 3.0. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #28368: [SPARK-31575][SQL] Synchronise global JVM security configuration modification
srowen commented on pull request #28368: URL: https://github.com/apache/spark/pull/28368#issuecomment-626229811 Does this cause enough of a real-world problem that it should be in 3.0 or 2.4? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #28451: [SPARK-31636][SQL][DOCS] Remove HTML syntax in SQL reference
srowen commented on pull request #28451: URL: https://github.com/apache/spark/pull/28451#issuecomment-626229743 If there are no more comments, I'll merge tomorrow. This is for 3.1 only? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
AmplabJenkins removed a comment on pull request #28487: URL: https://github.com/apache/spark/pull/28487#issuecomment-626226016 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
AmplabJenkins commented on pull request #28487: URL: https://github.com/apache/spark/pull/28487#issuecomment-626226137 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
AmplabJenkins commented on pull request #28487: URL: https://github.com/apache/spark/pull/28487#issuecomment-626226016 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fan31415 opened a new pull request #28487: [SPARK-31671][ML] Wrong error message in VectorAssembler
fan31415 opened a new pull request #28487: URL: https://github.com/apache/spark/pull/28487 ### What changes were proposed in this pull request? When input column lengths can not be inferred and handleInvalid = "keep", VectorAssembler will throw a runtime exception. However the error message with this exception is not consistent. I change the content of this error message to make it work properly. ### Why are the changes needed? This is a bug. Here is a simple example to reproduce it. ``` // create a df without vector size val df = Seq( (Vectors.dense(1.0), Vectors.dense(2.0)) ).toDF("n1", "n2") // only set vector size hint for n1 column val hintedDf = new VectorSizeHint() .setInputCol("n1") .setSize(1) .transform(df) // assemble n1, n2 val output = new VectorAssembler() .setInputCols(Array("n1", "n2")) .setOutputCol("features") .setHandleInvalid("keep") .transform(hintedDf) // because only n1 has vector size, the error message should tell us to set vector size for n2 too output.show() ``` Expected error message: ``` Can not infer column lengths with handleInvalid = "keep". Consider using VectorSizeHint to add metadata for columns: [n2]. ``` Actual error message: ``` Can not infer column lengths with handleInvalid = "keep". Consider using VectorSizeHint to add metadata for columns: [n1, n2]. ``` This introduce difficulties when I try to resolve this exception, for I do not know which column required vectorSizeHint. This is especially troublesome when you have a large number of columns to deal with. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add test in VectorAssemblerSuite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
AmplabJenkins removed a comment on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626216398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
AmplabJenkins commented on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626216398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r422528642 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -157,23 +259,23 @@ private[spark] object ThreadUtils { */ def newDaemonFixedThreadPool(nThreads: Int, prefix: String): ThreadPoolExecutor = { val threadFactory = namedThreadFactory(prefix) -Executors.newFixedThreadPool(nThreads, threadFactory).asInstanceOf[ThreadPoolExecutor] +MDCAwareThreadPoolExecutor.newFixedThreadPool(nThreads, threadFactory) Review comment: I check and all uses of that function are from driver code. What do you think about reverting that one and create a new one called `newMDCAwareDaemonFixedThreadPool` so it can be used later? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] igreenfield commented on a change in pull request #26624: [SPARK-8981][CORE][test-hadoop3.2][test-java11] Add MDC support in Executor
igreenfield commented on a change in pull request #26624: URL: https://github.com/apache/spark/pull/26624#discussion_r422528642 ## File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala ## @@ -157,23 +259,23 @@ private[spark] object ThreadUtils { */ def newDaemonFixedThreadPool(nThreads: Int, prefix: String): ThreadPoolExecutor = { val threadFactory = namedThreadFactory(prefix) -Executors.newFixedThreadPool(nThreads, threadFactory).asInstanceOf[ThreadPoolExecutor] +MDCAwareThreadPoolExecutor.newFixedThreadPool(nThreads, threadFactory) Review comment: @cloud-fan I check and all uses of that function are from driver code. What do you think about reverting that one and create a new one called `newMDCAwareDaemonFixedThreadPool` so it can be used later? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
SparkQA commented on pull request #28486: URL: https://github.com/apache/spark/pull/28486#issuecomment-626216233 **[Test build #122467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122467/testReport)** for PR 28486 at commit [`448699f`](https://github.com/apache/spark/commit/448699f2ceb4cfaf32c3bb4ee0588b5991704434). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
MaxGekk commented on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626215784 The PR https://github.com/apache/spark/pull/28486 fixes the build failure https://github.com/apache/spark/pull/28481#issuecomment-626034381 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request #28486: [SPARK-31669][SQL][TESTS] Fix RowEncoderSuite failures on non-existing dates/timestamps
MaxGekk opened a new pull request #28486: URL: https://github.com/apache/spark/pull/28486 ### What changes were proposed in this pull request? Shift non-existing dates in Proleptic Gregorian calendar by 1 day. The reason for that is `RowEncoderSuite` generates random dates/timestamps in the hybrid calendar, and some dates/timestamps don't exist in Proleptic Gregorian calendar like 1000-02-29 because 1000 is not leap year in Proleptic Gregorian calendar. ### Why are the changes needed? This makes RowEncoderSuite much stable. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running RowEncoderSuite and set non-existing date manually: ```scala val date = new java.sql.Date(1000 - 1900, 1, 29) Try { date.toLocalDate; date }.getOrElse(new Date(date.getTime + MILLIS_PER_DAY)) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
akshatb1 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422511601 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: @Ngone51 Yes, not sure about the logs from `StandaloneAppClient$ClientEndpoint`. I will check again. This is the command I am using to submit jobs:.`/bin/spark-submit --master spark://127.0.0.1:7077 --conf spark.standalone.submit.waitAppCompletion=true --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
akshatb1 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422511601 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: @Ngone51 Yes, not sure about the logs. I will check again. This is the command I am using to submit jobs:.`/bin/spark-submit --master spark://127.0.0.1:7077 --conf spark.standalone.submit.waitAppCompletion=true --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
Ngone51 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422503865 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: > In both cases, I see the following in driver logs. I couldn't find any difference in logs. Hi @akshatb1 , logs are from `StandaloneAppClient$ClientEndpoint` and `StandaloneSchedulerBackend` rather than `org.apache.spark.deploy.ClientEndpoint`. Can you check again? > Just to confirm, are you suggesting to do this in lin # 180 in pollAndReportStatus method? Or should we handle this outside? I think just after line 180 should be ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
Ngone51 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422503865 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: > In both cases, I see the following in driver logs. I couldn't find any difference in logs. Hi @akshatb1 , logs are from `StandaloneAppClient$ClientEndpoint` and `StandaloneSchedulerBackend` rather than `org.apache.spark.deploy.ClientEndpoint`. Can you check again? > Just to confirm, are you suggesting to do this in lin # 180 in pollAndReportStatus method? Or should we handle this outside? I think just after line #180 should be ok. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
akshatb1 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422502868 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: > We can periodically send a message (e.g. we can send it after `Thread.sleep(REPORT_DRIVER_STATUS_INTERVAL)` ) to `ClientEndpoint` itself to check driver's status. @Ngone51 Thanks for this suggestion. Just to confirm, are you suggesting to do this in line # 180 in pollAndReportStatus method? Or should we handle this outside? ![image](https://user-images.githubusercontent.com/31816865/81476612-73605500-9230-11ea-83a3-937782cbe00f.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
akshatb1 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422502868 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: > We can periodically send a message (e.g. we can send it after `Thread.sleep(REPORT_DRIVER_STATUS_INTERVAL)` ) to `ClientEndpoint` itself to check driver's status. @Ngone51 Thanks for this suggestion. Just to confirm, are you suggesting to do this in lin # 180 in pollAndReportStatus method? Or should we handle this outside? ![image](https://user-images.githubusercontent.com/31816865/81476588-4449e380-9230-11ea-9704-713b84daf2d4.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
AmplabJenkins removed a comment on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626180429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
AmplabJenkins commented on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626180429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
SparkQA removed a comment on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626136760 **[Test build #122465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122465/testReport)** for PR 28481 at commit [`1150983`](https://github.com/apache/spark/commit/1150983e02e6e5ca390dfcfeaf203070e953ed03). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
SparkQA commented on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626180098 **[Test build #122465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122465/testReport)** for PR 28481 at commit [`1150983`](https://github.com/apache/spark/commit/1150983e02e6e5ca390dfcfeaf203070e953ed03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
AmplabJenkins removed a comment on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626172999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
AmplabJenkins commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626172999 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
SparkQA commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626172867 **[Test build #122466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)** for PR 27473 at commit [`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
SparkQA removed a comment on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626164399 **[Test build #122466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)** for PR 27473 at commit [`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins removed a comment on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626171759 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626171894 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
AmplabJenkins commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626171759 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson commented on pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
iRakson commented on pull request #28485: URL: https://github.com/apache/spark/pull/28485#issuecomment-626171426 cc @srowen @zsxwing @sarutak @uncleGen Kindly Review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] iRakson opened a new pull request #28485: [SPARK-31642] Add Pagination Support for Structured Streaming Page
iRakson opened a new pull request #28485: URL: https://github.com/apache/spark/pull/28485 ### What changes were proposed in this pull request? Add Pagination Support for structured streaming page. Now both tables `Active Queries` and `Completed Queries` will have pagination. To implement pagination, pagination framework from #7399 is used. * Also tables will only be shown if there is at least one entry in the table. ### Why are the changes needed? * This will help users in analysing their structured streaming queries in much better way. * Other Web UI pages support pagination in their table. So this will make web UI more consistent across pages. * This can prevent potential OOM errors. ### Does this PR introduce _any_ user-facing change? Yes. Both tables will support pagination. ### How was this patch tested? Manually. I will add snapshots soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
AmplabJenkins removed a comment on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626168671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
AmplabJenkins commented on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626168671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
SparkQA removed a comment on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626126204 **[Test build #122464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122464/testReport)** for PR 28391 at commit [`c3db6cf`](https://github.com/apache/spark/commit/c3db6cfaaa2260e7f11436e84fc2576ad4f8dde1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
SparkQA commented on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626168470 **[Test build #122464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122464/testReport)** for PR 28391 at commit [`c3db6cf`](https://github.com/apache/spark/commit/c3db6cfaaa2260e7f11436e84fc2576ad4f8dde1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
SparkQA commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626164399 **[Test build #122466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122466/testReport)** for PR 27473 at commit [`a3a005e`](https://github.com/apache/spark/commit/a3a005e588b52009a674dc5b9ac237b97017cd25). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
AmplabJenkins removed a comment on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626163688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
AmplabJenkins removed a comment on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626163493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #27473: [SPARK-30699][ML][PYSPARK] GMM blockify input vectors
AmplabJenkins commented on pull request #27473: URL: https://github.com/apache/spark/pull/27473#issuecomment-626163688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
AmplabJenkins commented on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626163493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
SparkQA removed a comment on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626121859 **[Test build #122462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122462/testReport)** for PR 28391 at commit [`914188c`](https://github.com/apache/spark/commit/914188c83642341e459ff828a4d3a4ae4c1be224). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28391: [SPARK-31593][SS] Remove unnecessary streaming query progress update
SparkQA commented on pull request #28391: URL: https://github.com/apache/spark/pull/28391#issuecomment-626163191 **[Test build #122462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122462/testReport)** for PR 28391 at commit [`914188c`](https://github.com/apache/spark/commit/914188c83642341e459ff828a4d3a4ae4c1be224). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] akshatb1 commented on a change in pull request #28258: [SPARK-31486] [CORE] spark.submit.waitAppCompletion flag to control spark-submit exit in Standalone Cluster Mode
akshatb1 commented on a change in pull request #28258: URL: https://github.com/apache/spark/pull/28258#discussion_r422469087 ## File path: core/src/main/scala/org/apache/spark/deploy/Client.scala ## @@ -124,38 +127,57 @@ private class ClientEndpoint( } } - /* Find out driver status then exit the JVM */ + /** + * Find out driver status then exit the JVM. If the waitAppCompletion is set to true, monitors + * the application until it finishes, fails or is killed. + */ def pollAndReportStatus(driverId: String): Unit = { // Since ClientEndpoint is the only RpcEndpoint in the process, blocking the event loop thread // is fine. logInfo("... waiting before polling master for driver state") Thread.sleep(5000) logInfo("... polling master for driver state") -val statusResponse = - activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId)) -if (statusResponse.found) { - logInfo(s"State of $driverId is ${statusResponse.state.get}") - // Worker node, if present - (statusResponse.workerId, statusResponse.workerHostPort, statusResponse.state) match { -case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) => - logInfo(s"Driver running on $hostPort ($id)") -case _ => - } - // Exception, if present - statusResponse.exception match { -case Some(e) => - logError(s"Exception from cluster was: $e") - e.printStackTrace() - System.exit(-1) -case _ => - System.exit(0) +while (true) { Review comment: @Ngone51 I launched a long-running application with flag enabled and disabled and stopped the Spark Master in middle. In both cases, I see the following in driver logs. I couldn't find any difference in logs. ``` 20/05/09 13:42:59 WARN StandaloneAppClient$ClientEndpoint: Connection to Akshats-MacBook-Pro.local:7077 failed; waiting for master to reconnect... 20/05/09 13:42:59 WARN StandaloneSchedulerBackend: Disconnected from Spark cluster! Waiting for reconnection... ``` `onDisconnected` method from `StandaloneAppClient.scala` is getting called: ![image](https://user-images.githubusercontent.com/31816865/81468361-bb658480-91fc-11ea-87d5-3d00cbf7619f.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
AmplabJenkins removed a comment on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626136957 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28481: [SPARK-31665][SQL][TESTS] Check parquet dictionary encoding of random dates/timestamps
AmplabJenkins commented on pull request #28481: URL: https://github.com/apache/spark/pull/28481#issuecomment-626136957 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org