[GitHub] [spark-website] gatorsmile commented on issue #202: Spark 2.4.3: Add API doc for pyspark ML
gatorsmile commented on issue #202: Spark 2.4.3: Add API doc for pyspark ML URL: https://github.com/apache/spark-website/pull/202#issuecomment-490752274 cc @cloud-fan @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile opened a new pull request #202: Spark 2.4.3: Add pyspark apis
gatorsmile opened a new pull request #202: Spark 2.4.3: Add pyspark apis URL: https://github.com/apache/spark-website/pull/202 In the previous upgrade, the ML python APIs are not correctly generated. Now, the problem is fixed and updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0969d7a [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… 0969d7a is described below commit 0969d7aa0ca7c4b32f4753387db8ad67ac6764b0 Author: pgandhi AuthorDate: Thu May 9 11:12:20 2019 +0800 [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… …rtBasedAggregate Normally, the aggregate operations that are invoked for an aggregation buffer for User Defined Aggregate Functions(UDAF) follow the order like initialize(), update(), eval() OR initialize(), merge(), eval(). However, after a certain threshold configurable by spark.sql.objectHashAggregate.sortBased.fallbackThreshold is reached, ObjectHashAggregate falls back to SortBasedAggregator which invokes the merge or update operation without calling initialize() on the aggregate buffer. ## What changes were proposed in this pull request? The fix here is to initialize aggregate buffers again when fallback to SortBasedAggregate operator happens. ## How was this patch tested? The patch was tested as part of [SPARK-24935](https://issues.apache.org/jira/browse/SPARK-24935) as documented in PR https://github.com/apache/spark/pull/23778. Closes #24149 from pgandhi999/SPARK-27207. Authored-by: pgandhi Signed-off-by: Wenchen Fan --- .../apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala index 56c2ee6..6fc2053 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala @@ -158,7 +158,7 @@ case class AggregateExpression( * ([[aggBufferAttributes]]) of an aggregation buffer which is used to hold partial aggregate * results. At runtime, multiple aggregate functions are evaluated by the same operator using a * combined aggregation buffer which concatenates the aggregation buffers of the individual - * aggregate functions. + * aggregate functions. Please note that aggregate functions should be stateless. * * Code which accepts [[AggregateFunction]] instances should be prepared to handle both types of * aggregate functions. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] cloud-fan commented on a change in pull request #201: Update Spark website for 2.4.3 release
cloud-fan commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282317639 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; // 2.4.2+ var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, sources]; -addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true); +addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true); Review comment: The release process usually takes days, so I pick the date that passed the vote. It's also fine to pick the date that started the release process. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile closed pull request #201: Update Spark website for 2.4.3 release
gatorsmile closed pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#issuecomment-490696164 Thanks! Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (57450ed -> 78a403f)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 57450ed [MINOR][SS] Rename `secondLatestBatchId` to `secondLatestOffsets` add 78a403f [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources No new revisions were added by this update. Summary of changes: docs/sql-data-sources-binaryFile.md| 16 + docs/sql-data-sources-load-save-functions.md | 21 +++ .../examples/sql/JavaSQLDataSourceExample.java | 5 ++ examples/src/main/python/sql/datasource.py | 5 ++ examples/src/main/r/RSparkSQLExample.R | 4 ++ .../partitioned_users.orc/do_not_read_this.txt | 1 + .../users.orc | Bin 0 -> 448 bytes .../favorite_color=red/users.orc | Bin 0 -> 402 bytes .../spark/examples/sql/SQLDataSourceExample.scala | 5 ++ .../org/apache/spark/sql/avro/AvroFileFormat.scala | 4 ++ .../org/apache/spark/sql/avro/AvroOptions.scala| 5 +- python/pyspark/sql/readwriter.py | 6 ++ python/pyspark/sql/streaming.py| 6 ++ .../org/apache/spark/sql/DataFrameReader.scala | 9 +++ .../sql/execution/datasources/DataSource.scala | 3 +- .../datasources/PartitioningAwareFileIndex.scala | 11 +++- .../datasources/binaryfile/BinaryFileFormat.scala | 70 - .../sql/execution/streaming/FileStreamSource.scala | 4 +- .../execution/streaming/MetadataLogFileIndex.scala | 3 +- .../spark/sql/streaming/DataStreamReader.scala | 9 +++ .../spark/sql/FileBasedDataSourceSuite.scala | 32 ++ .../sql/streaming/FileStreamSourceSuite.scala | 19 ++ 22 files changed, 173 insertions(+), 65 deletions(-) create mode 100644 examples/src/main/resources/partitioned_users.orc/do_not_read_this.txt create mode 100644 examples/src/main/resources/partitioned_users.orc/favorite_color=__HIVE_DEFAULT_PARTITION__/users.orc create mode 100644 examples/src/main/resources/partitioned_users.orc/favorite_color=red/users.orc - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated (b15866c -> 2f16255)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git. from b15866c [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class add 2f16255 [SPARK-25139][SPARK-18406][CORE][2.4] Avoid NonFatals to kill the Executor in PythonRunner No new revisions were added by this update. Summary of changes: .../org/apache/spark/api/python/PythonRDD.scala| 2 +- .../org/apache/spark/api/python/PythonRunner.scala | 34 -- 2 files changed, 20 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (09422f5 -> 57450ed)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 09422f5 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class add 57450ed [MINOR][SS] Rename `secondLatestBatchId` to `secondLatestOffsets` No new revisions were added by this update. Summary of changes: .../apache/spark/sql/execution/streaming/MicroBatchExecution.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on issue #201: Update Spark website for 2.4.3 release
dongjoon-hyun commented on issue #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#issuecomment-490583282 Sure, @gatorsmile ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#issuecomment-490583085 @dongjoon-hyun Could you help run the command `jekyll serve` and see whether all the links and descriptions are accurate? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release
dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282174708 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; // 2.4.2+ var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, sources]; -addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true); +addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true); Review comment: Then, could you document this in the release manager's guide? This seems to be arguable. Previously, the vote result announce date is used. cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release
gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282174107 ## File path: site/committers.html ## @@ -162,6 +162,9 @@ Latest News + Spark 2.4.3 released + (May 08, 2019) Review comment: This is the news post date. That is today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release
gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282173536 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; // 2.4.2+ var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, sources]; -addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true); +addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true); Review comment: The vote passed on 05/06, but the release actually happened next day. See the release date of maven repo: https://mvnrepository.com/artifact/org.apache.spark This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release
dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282173201 ## File path: site/committers.html ## @@ -162,6 +162,9 @@ Latest News + Spark 2.4.3 released + (May 08, 2019) Review comment: This also doesn't match with the download page, `05/07`, too. Let's fix this to `(May 06, 2019)`, consistently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release
dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282171424 ## File path: js/downloads.js ## @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources]; // 2.4.2+ var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, sources]; -addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true); +addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true); Review comment: According to the vote result announcement, `05/06` will be correct. - https://lists.apache.org/thread.html/0efe232724f9696225d62dba0377f6ce39b6317b69493a3de0f9b7a3@%3Cdev.spark.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release
dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#discussion_r282170787 ## File path: downloads.md ## @@ -26,14 +26,14 @@ $(document).ready(function() { 4. Verify this release using the and [project release KEYS](https://www.apache.org/dist/spark/KEYS). -Note that, Spark is pre-built with Scala 2.12 since version 2.4.2. Previous versions are pre-built with Scala 2.11. +Note that, Spark is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.11. Review comment: ``` - 2.4.2, which is pre-built with Scala 2.11. + 2.4.2, which is pre-built with Scala 2.12. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201#issuecomment-490569716 cc @cloud-fan @dongjoon-hyun @marmbrus @rxin @mengxr @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] gatorsmile opened a new pull request #201: Update Spark website for 2.4.3 release
gatorsmile opened a new pull request #201: Update Spark website for 2.4.3 release URL: https://github.com/apache/spark-website/pull/201 This PR is to add the release note and news for Spark 2.4.3 release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.3 updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new 564dbf6 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class 564dbf6 is described below commit 564dbf61b9e3febd623a08bec9506505fd337bc3 Author: Asaf Levy AuthorDate: Wed May 8 23:45:05 2019 +0900 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class ## What changes were proposed in this pull request? When following the example for using `spark.streams().awaitAnyTermination()` a valid pyspark code will output the following error: ``` Traceback (most recent call last): File "pyspark_app.py", line 182, in spark.streams().awaitAnyTermination() TypeError: 'StreamingQueryManager' object is not callable ``` Docs URL: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries This changes the documentation line to properly call the method under the StreamingQueryManager Class https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager ## How was this patch tested? After changing the syntax, error no longer occurs and pyspark application works This is only docs change Closes #24547 from asaf400/patch-1. Authored-by: Asaf Levy Signed-off-by: HyukjinKwon (cherry picked from commit 09422f5139cc13abaf506453819c2bb91e174ae3) Signed-off-by: HyukjinKwon --- docs/structured-streaming-programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 9a83f15..ab229a6 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -2310,11 +2310,11 @@ spark.streams().awaitAnyTermination(); // block until any one of them terminat {% highlight python %} spark = ... # spark session -spark.streams().active # get the list of currently active streaming queries +spark.streams.active # get the list of currently active streaming queries -spark.streams().get(id) # get a query object by its unique id +spark.streams.get(id) # get a query object by its unique id -spark.streams().awaitAnyTermination() # block until any one of them terminates +spark.streams.awaitAnyTermination() # block until any one of them terminates {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new b15866c [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class b15866c is described below commit b15866cec48a3f61927283ba2afdc5616b702de9 Author: Asaf Levy AuthorDate: Wed May 8 23:45:05 2019 +0900 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class ## What changes were proposed in this pull request? When following the example for using `spark.streams().awaitAnyTermination()` a valid pyspark code will output the following error: ``` Traceback (most recent call last): File "pyspark_app.py", line 182, in spark.streams().awaitAnyTermination() TypeError: 'StreamingQueryManager' object is not callable ``` Docs URL: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries This changes the documentation line to properly call the method under the StreamingQueryManager Class https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager ## How was this patch tested? After changing the syntax, error no longer occurs and pyspark application works This is only docs change Closes #24547 from asaf400/patch-1. Authored-by: Asaf Levy Signed-off-by: HyukjinKwon (cherry picked from commit 09422f5139cc13abaf506453819c2bb91e174ae3) Signed-off-by: HyukjinKwon --- docs/structured-streaming-programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3d91223..f0971ab 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -2539,11 +2539,11 @@ spark.streams().awaitAnyTermination(); // block until any one of them terminat {% highlight python %} spark = ... # spark session -spark.streams().active # get the list of currently active streaming queries +spark.streams.active # get the list of currently active streaming queries -spark.streams().get(id) # get a query object by its unique id +spark.streams.get(id) # get a query object by its unique id -spark.streams().awaitAnyTermination() # block until any one of them terminates +spark.streams.awaitAnyTermination() # block until any one of them terminates {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 09422f5 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class 09422f5 is described below commit 09422f5139cc13abaf506453819c2bb91e174ae3 Author: Asaf Levy AuthorDate: Wed May 8 23:45:05 2019 +0900 [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class ## What changes were proposed in this pull request? When following the example for using `spark.streams().awaitAnyTermination()` a valid pyspark code will output the following error: ``` Traceback (most recent call last): File "pyspark_app.py", line 182, in spark.streams().awaitAnyTermination() TypeError: 'StreamingQueryManager' object is not callable ``` Docs URL: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries This changes the documentation line to properly call the method under the StreamingQueryManager Class https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager ## How was this patch tested? After changing the syntax, error no longer occurs and pyspark application works This is only docs change Closes #24547 from asaf400/patch-1. Authored-by: Asaf Levy Signed-off-by: HyukjinKwon --- docs/structured-streaming-programming-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 2c4169d..be30f6e 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -2554,11 +2554,11 @@ spark.streams().awaitAnyTermination(); // block until any one of them terminat {% highlight python %} spark = ... # spark session -spark.streams().active # get the list of currently active streaming queries +spark.streams.active # get the list of currently active streaming queries -spark.streams().get(id) # get a query object by its unique id +spark.streams.get(id) # get a query object by its unique id -spark.streams().awaitAnyTermination() # block until any one of them terminates +spark.streams.awaitAnyTermination() # block until any one of them terminates {% endhighlight %} - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27642][SS] make v1 offset extends v2 offset
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bae5baa [SPARK-27642][SS] make v1 offset extends v2 offset bae5baa is described below commit bae5baae5281d01dc8c67077b90592be857329bd Author: Wenchen Fan AuthorDate: Tue May 7 23:03:15 2019 -0700 [SPARK-27642][SS] make v1 offset extends v2 offset ## What changes were proposed in this pull request? To move DS v2 to the catalyst module, we can't make v2 offset rely on v1 offset, as v1 offset is in sql/core. ## How was this patch tested? existing tests Closes #24538 from cloud-fan/offset. Authored-by: Wenchen Fan Signed-off-by: gatorsmile --- .../spark/sql/kafka010/KafkaContinuousStream.scala | 2 +- .../spark/sql/kafka010/KafkaSourceOffset.scala | 4 +-- .../spark/sql/execution/streaming/Offset.java | 42 +++--- .../sql/sources/v2/reader/streaming/Offset.java| 11 ++ .../spark/sql/execution/streaming/LongOffset.scala | 14 +--- .../execution/streaming/MicroBatchExecution.scala | 10 +++--- .../spark/sql/execution/streaming/OffsetSeq.scala | 9 ++--- .../sql/execution/streaming/OffsetSeqLog.scala | 3 +- .../sql/execution/streaming/StreamExecution.scala | 4 +-- .../sql/execution/streaming/StreamProgress.scala | 19 +- .../spark/sql/execution/streaming/memory.scala | 25 + .../sources/TextSocketMicroBatchStream.scala | 5 +-- .../apache/spark/sql/streaming/StreamTest.scala| 8 ++--- 13 files changed, 49 insertions(+), 107 deletions(-) diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala index d60ee1c..92686d2 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala @@ -76,7 +76,7 @@ class KafkaContinuousStream( } override def planInputPartitions(start: Offset): Array[InputPartition] = { -val oldStartPartitionOffsets = KafkaSourceOffset.getPartitionOffsets(start) +val oldStartPartitionOffsets = start.asInstanceOf[KafkaSourceOffset].partitionToOffsets val currentPartitionSet = offsetReader.fetchEarliestOffsets().keySet val newPartitions = currentPartitionSet.diff(oldStartPartitionOffsets.keySet) diff --git a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala index 8d41c0d..90d7043 100644 --- a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala +++ b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala @@ -20,14 +20,14 @@ package org.apache.spark.sql.kafka010 import org.apache.kafka.common.TopicPartition import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset} -import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => OffsetV2, PartitionOffset} +import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset /** * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of subscribed topics and * their offsets. */ private[kafka010] -case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends OffsetV2 { +case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) extends Offset { override val json = JsonUtils.partitionOffsets(partitionToOffsets) } diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java index 43ad4b3..7c167dc 100644 --- a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java +++ b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java @@ -18,44 +18,10 @@ package org.apache.spark.sql.execution.streaming; /** - * This is an internal, deprecated interface. New source implementations should use the - * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the one that will be - * supported in the long term. + * This class is an alias of {@link org.apache.spark.sql.sources.v2.reader.streaming.Offset}. It's + * internal and deprecated. New streaming data source implementations should use data source v2 API, + * which will be supported in the long term. * * This class will be removed in a future release. */ -public abstract class Offset { -/** - * A JSON-serialized representation of an Offset that is - * used for saving offsets to the