[GitHub] [spark-website] gatorsmile commented on issue #202: Spark 2.4.3: Add API doc for pyspark ML

2019-05-08 Thread GitBox
gatorsmile commented on issue #202: Spark 2.4.3: Add API doc for pyspark ML 
URL: https://github.com/apache/spark-website/pull/202#issuecomment-490752274
 
 
   cc @cloud-fan @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile opened a new pull request #202: Spark 2.4.3: Add pyspark apis

2019-05-08 Thread GitBox
gatorsmile opened a new pull request #202: Spark 2.4.3: Add pyspark apis
URL: https://github.com/apache/spark-website/pull/202
 
 
   In the previous upgrade, the ML python APIs are not correctly generated. 
Now, the problem is fixed and updated. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…

2019-05-08 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0969d7a  [SPARK-27207][SQL] : Ensure aggregate buffers are initialized 
again for So…
0969d7a is described below

commit 0969d7aa0ca7c4b32f4753387db8ad67ac6764b0
Author: pgandhi 
AuthorDate: Thu May 9 11:12:20 2019 +0800

[SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…

…rtBasedAggregate

Normally, the aggregate operations that are invoked for an aggregation 
buffer for User Defined Aggregate Functions(UDAF) follow the order like 
initialize(), update(), eval() OR initialize(), merge(), eval(). However, after 
a certain threshold configurable by 
spark.sql.objectHashAggregate.sortBased.fallbackThreshold is reached, 
ObjectHashAggregate falls back to SortBasedAggregator which invokes the merge 
or update operation without calling initialize() on the aggregate buffer.

## What changes were proposed in this pull request?

The fix here is to initialize aggregate buffers again when fallback to 
SortBasedAggregate operator happens.

## How was this patch tested?

The patch was tested as part of 
[SPARK-24935](https://issues.apache.org/jira/browse/SPARK-24935) as documented 
in PR https://github.com/apache/spark/pull/23778.

Closes #24149 from pgandhi999/SPARK-27207.

Authored-by: pgandhi 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
index 56c2ee6..6fc2053 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
@@ -158,7 +158,7 @@ case class AggregateExpression(
  * ([[aggBufferAttributes]]) of an aggregation buffer which is used to hold 
partial aggregate
  * results. At runtime, multiple aggregate functions are evaluated by the same 
operator using a
  * combined aggregation buffer which concatenates the aggregation buffers of 
the individual
- * aggregate functions.
+ * aggregate functions. Please note that aggregate functions should be 
stateless.
  *
  * Code which accepts [[AggregateFunction]] instances should be prepared to 
handle both types of
  * aggregate functions.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] cloud-fan commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
cloud-fan commented on a change in pull request #201: Update Spark website for 
2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282317639
 
 

 ##
 File path: js/downloads.js
 ##
 @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
 // 2.4.2+
 var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, 
sources];
 
-addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true);
+addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true);
 
 Review comment:
   The release process usually takes days, so I pick the date that passed the 
vote. It's also fine to pick the date that started the release process.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile closed pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile closed pull request #201: Update Spark website for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#issuecomment-490696164
 
 
   Thanks! Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (57450ed -> 78a403f)

2019-05-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 57450ed  [MINOR][SS] Rename `secondLatestBatchId` to 
`secondLatestOffsets`
 add 78a403f  [SPARK-27627][SQL] Make option "pathGlobFilter" as a general 
option for all file sources

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-binaryFile.md|  16 +
 docs/sql-data-sources-load-save-functions.md   |  21 +++
 .../examples/sql/JavaSQLDataSourceExample.java |   5 ++
 examples/src/main/python/sql/datasource.py |   5 ++
 examples/src/main/r/RSparkSQLExample.R |   4 ++
 .../partitioned_users.orc/do_not_read_this.txt |   1 +
 .../users.orc  | Bin 0 -> 448 bytes
 .../favorite_color=red/users.orc   | Bin 0 -> 402 bytes
 .../spark/examples/sql/SQLDataSourceExample.scala  |   5 ++
 .../org/apache/spark/sql/avro/AvroFileFormat.scala |   4 ++
 .../org/apache/spark/sql/avro/AvroOptions.scala|   5 +-
 python/pyspark/sql/readwriter.py   |   6 ++
 python/pyspark/sql/streaming.py|   6 ++
 .../org/apache/spark/sql/DataFrameReader.scala |   9 +++
 .../sql/execution/datasources/DataSource.scala |   3 +-
 .../datasources/PartitioningAwareFileIndex.scala   |  11 +++-
 .../datasources/binaryfile/BinaryFileFormat.scala  |  70 -
 .../sql/execution/streaming/FileStreamSource.scala |   4 +-
 .../execution/streaming/MetadataLogFileIndex.scala |   3 +-
 .../spark/sql/streaming/DataStreamReader.scala |   9 +++
 .../spark/sql/FileBasedDataSourceSuite.scala   |  32 ++
 .../sql/streaming/FileStreamSourceSuite.scala  |  19 ++
 22 files changed, 173 insertions(+), 65 deletions(-)
 create mode 100644 
examples/src/main/resources/partitioned_users.orc/do_not_read_this.txt
 create mode 100644 
examples/src/main/resources/partitioned_users.orc/favorite_color=__HIVE_DEFAULT_PARTITION__/users.orc
 create mode 100644 
examples/src/main/resources/partitioned_users.orc/favorite_color=red/users.orc


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated (b15866c -> 2f16255)

2019-05-08 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b15866c  [MINOR][DOCS] Fix invalid documentation for 
StreamingQueryManager Class
 add 2f16255  [SPARK-25139][SPARK-18406][CORE][2.4] Avoid NonFatals to kill 
the Executor in PythonRunner

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/api/python/PythonRDD.scala|  2 +-
 .../org/apache/spark/api/python/PythonRunner.scala | 34 --
 2 files changed, 20 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (09422f5 -> 57450ed)

2019-05-08 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 09422f5  [MINOR][DOCS] Fix invalid documentation for 
StreamingQueryManager Class
 add 57450ed  [MINOR][SS] Rename `secondLatestBatchId` to 
`secondLatestOffsets`

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/streaming/MicroBatchExecution.scala| 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on issue #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
dongjoon-hyun commented on issue #201: Update Spark website for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#issuecomment-490583282
 
 
   Sure, @gatorsmile !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#issuecomment-490583085
 
 
   @dongjoon-hyun Could you help run the command `jekyll serve` and see whether 
all the links and descriptions are accurate? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
dongjoon-hyun commented on a change in pull request #201: Update Spark website 
for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282174708
 
 

 ##
 File path: js/downloads.js
 ##
 @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
 // 2.4.2+
 var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, 
sources];
 
-addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true);
+addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true);
 
 Review comment:
   Then, could you document this in the release manager's guide? This seems to 
be arguable. Previously, the vote result announce date is used.
   cc @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile commented on a change in pull request #201: Update Spark website for 
2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282174107
 
 

 ##
 File path: site/committers.html
 ##
 @@ -162,6 +162,9 @@
   Latest News
   
 
+  Spark 2.4.3 
released
+  (May 08, 2019)
 
 Review comment:
   This is the news post date. That is today. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile commented on a change in pull request #201: Update Spark website for 
2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282173536
 
 

 ##
 File path: js/downloads.js
 ##
 @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
 // 2.4.2+
 var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, 
sources];
 
-addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true);
+addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true);
 
 Review comment:
   The vote passed on 05/06, but the release actually happened next day. See 
the release date of maven repo: 
https://mvnrepository.com/artifact/org.apache.spark


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
dongjoon-hyun commented on a change in pull request #201: Update Spark website 
for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282173201
 
 

 ##
 File path: site/committers.html
 ##
 @@ -162,6 +162,9 @@
   Latest News
   
 
+  Spark 2.4.3 
released
+  (May 08, 2019)
 
 Review comment:
   This also doesn't match with the download page, `05/07`, too. Let's fix this 
to `(May 06, 2019)`, consistently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
dongjoon-hyun commented on a change in pull request #201: Update Spark website 
for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282171424
 
 

 ##
 File path: js/downloads.js
 ##
 @@ -22,7 +22,7 @@ var packagesV8 = [hadoop2p7, hadoop2p6, hadoopFree, sources];
 // 2.4.2+
 var packagesV9 = [hadoop2p7, hadoop2p6, hadoopFree, scala2p11_hadoopFree, 
sources];
 
-addRelease("2.4.2", new Date("04/23/2019"), packagesV9, true);
+addRelease("2.4.3", new Date("05/07/2019"), packagesV9, true);
 
 Review comment:
   According to the vote result announcement, `05/06` will be correct.
   - 
https://lists.apache.org/thread.html/0efe232724f9696225d62dba0377f6ce39b6317b69493a3de0f9b7a3@%3Cdev.spark.apache.org%3E


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] dongjoon-hyun commented on a change in pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
dongjoon-hyun commented on a change in pull request #201: Update Spark website 
for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#discussion_r282170787
 
 

 ##
 File path: downloads.md
 ##
 @@ -26,14 +26,14 @@ $(document).ready(function() {
 
 4. Verify this release using the  and 
[project release KEYS](https://www.apache.org/dist/spark/KEYS).
 
-Note that, Spark is pre-built with Scala 2.12 since version 2.4.2. Previous 
versions are pre-built with Scala 2.11.
+Note that, Spark is pre-built with Scala 2.11 except version 2.4.2, which is 
pre-built with Scala 2.11.
 
 Review comment:
   ```
   - 2.4.2, which is pre-built with Scala 2.11.
   + 2.4.2, which is pre-built with Scala 2.12.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile commented on issue #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile commented on issue #201: Update Spark website for 2.4.3 release
URL: https://github.com/apache/spark-website/pull/201#issuecomment-490569716
 
 
   cc @cloud-fan @dongjoon-hyun @marmbrus @rxin @mengxr @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] gatorsmile opened a new pull request #201: Update Spark website for 2.4.3 release

2019-05-08 Thread GitBox
gatorsmile opened a new pull request #201: Update Spark website for 2.4.3 
release
URL: https://github.com/apache/spark-website/pull/201
 
 
   This PR is to add the release note and news for Spark 2.4.3 release. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.3 updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

2019-05-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 564dbf6  [MINOR][DOCS] Fix invalid documentation for 
StreamingQueryManager Class
564dbf6 is described below

commit 564dbf61b9e3febd623a08bec9506505fd337bc3
Author: Asaf Levy 
AuthorDate: Wed May 8 23:45:05 2019 +0900

[MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

## What changes were proposed in this pull request?

When following the example for using `spark.streams().awaitAnyTermination()`
a valid pyspark code will output the following error:

```
Traceback (most recent call last):
  File "pyspark_app.py", line 182, in 
spark.streams().awaitAnyTermination()
TypeError: 'StreamingQueryManager' object is not callable
```

Docs URL: 
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries

This changes the documentation line to properly call the method under the 
StreamingQueryManager Class

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager

## How was this patch tested?

After changing the syntax, error no longer occurs and pyspark application 
works

This is only docs change

Closes #24547 from asaf400/patch-1.

Authored-by: Asaf Levy 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 09422f5139cc13abaf506453819c2bb91e174ae3)
Signed-off-by: HyukjinKwon 
---
 docs/structured-streaming-programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 9a83f15..ab229a6 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -2310,11 +2310,11 @@ spark.streams().awaitAnyTermination();   // block until 
any one of them terminat
 {% highlight python %}
 spark = ...  # spark session
 
-spark.streams().active  # get the list of currently active streaming queries
+spark.streams.active  # get the list of currently active streaming queries
 
-spark.streams().get(id)  # get a query object by its unique id
+spark.streams.get(id)  # get a query object by its unique id
 
-spark.streams().awaitAnyTermination()  # block until any one of them terminates
+spark.streams.awaitAnyTermination()  # block until any one of them terminates
 {% endhighlight %}
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

2019-05-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new b15866c  [MINOR][DOCS] Fix invalid documentation for 
StreamingQueryManager Class
b15866c is described below

commit b15866cec48a3f61927283ba2afdc5616b702de9
Author: Asaf Levy 
AuthorDate: Wed May 8 23:45:05 2019 +0900

[MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

## What changes were proposed in this pull request?

When following the example for using `spark.streams().awaitAnyTermination()`
a valid pyspark code will output the following error:

```
Traceback (most recent call last):
  File "pyspark_app.py", line 182, in 
spark.streams().awaitAnyTermination()
TypeError: 'StreamingQueryManager' object is not callable
```

Docs URL: 
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries

This changes the documentation line to properly call the method under the 
StreamingQueryManager Class

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager

## How was this patch tested?

After changing the syntax, error no longer occurs and pyspark application 
works

This is only docs change

Closes #24547 from asaf400/patch-1.

Authored-by: Asaf Levy 
Signed-off-by: HyukjinKwon 
(cherry picked from commit 09422f5139cc13abaf506453819c2bb91e174ae3)
Signed-off-by: HyukjinKwon 
---
 docs/structured-streaming-programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 3d91223..f0971ab 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -2539,11 +2539,11 @@ spark.streams().awaitAnyTermination();   // block until 
any one of them terminat
 {% highlight python %}
 spark = ...  # spark session
 
-spark.streams().active  # get the list of currently active streaming queries
+spark.streams.active  # get the list of currently active streaming queries
 
-spark.streams().get(id)  # get a query object by its unique id
+spark.streams.get(id)  # get a query object by its unique id
 
-spark.streams().awaitAnyTermination()  # block until any one of them terminates
+spark.streams.awaitAnyTermination()  # block until any one of them terminates
 {% endhighlight %}
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

2019-05-08 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 09422f5  [MINOR][DOCS] Fix invalid documentation for 
StreamingQueryManager Class
09422f5 is described below

commit 09422f5139cc13abaf506453819c2bb91e174ae3
Author: Asaf Levy 
AuthorDate: Wed May 8 23:45:05 2019 +0900

[MINOR][DOCS] Fix invalid documentation for StreamingQueryManager Class

## What changes were proposed in this pull request?

When following the example for using `spark.streams().awaitAnyTermination()`
a valid pyspark code will output the following error:

```
Traceback (most recent call last):
  File "pyspark_app.py", line 182, in 
spark.streams().awaitAnyTermination()
TypeError: 'StreamingQueryManager' object is not callable
```

Docs URL: 
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#managing-streaming-queries

This changes the documentation line to properly call the method under the 
StreamingQueryManager Class

https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager

## How was this patch tested?

After changing the syntax, error no longer occurs and pyspark application 
works

This is only docs change

Closes #24547 from asaf400/patch-1.

Authored-by: Asaf Levy 
Signed-off-by: HyukjinKwon 
---
 docs/structured-streaming-programming-guide.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 2c4169d..be30f6e 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -2554,11 +2554,11 @@ spark.streams().awaitAnyTermination();   // block until 
any one of them terminat
 {% highlight python %}
 spark = ...  # spark session
 
-spark.streams().active  # get the list of currently active streaming queries
+spark.streams.active  # get the list of currently active streaming queries
 
-spark.streams().get(id)  # get a query object by its unique id
+spark.streams.get(id)  # get a query object by its unique id
 
-spark.streams().awaitAnyTermination()  # block until any one of them terminates
+spark.streams.awaitAnyTermination()  # block until any one of them terminates
 {% endhighlight %}
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-27642][SS] make v1 offset extends v2 offset

2019-05-08 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bae5baa  [SPARK-27642][SS] make v1 offset extends v2 offset
bae5baa is described below

commit bae5baae5281d01dc8c67077b90592be857329bd
Author: Wenchen Fan 
AuthorDate: Tue May 7 23:03:15 2019 -0700

[SPARK-27642][SS] make v1 offset extends v2 offset

## What changes were proposed in this pull request?

To move DS v2 to the catalyst module, we can't make v2 offset rely on v1 
offset, as v1 offset is in sql/core.

## How was this patch tested?

existing tests

Closes #24538 from cloud-fan/offset.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 .../spark/sql/kafka010/KafkaContinuousStream.scala |  2 +-
 .../spark/sql/kafka010/KafkaSourceOffset.scala |  4 +--
 .../spark/sql/execution/streaming/Offset.java  | 42 +++---
 .../sql/sources/v2/reader/streaming/Offset.java| 11 ++
 .../spark/sql/execution/streaming/LongOffset.scala | 14 +---
 .../execution/streaming/MicroBatchExecution.scala  | 10 +++---
 .../spark/sql/execution/streaming/OffsetSeq.scala  |  9 ++---
 .../sql/execution/streaming/OffsetSeqLog.scala |  3 +-
 .../sql/execution/streaming/StreamExecution.scala  |  4 +--
 .../sql/execution/streaming/StreamProgress.scala   | 19 +-
 .../spark/sql/execution/streaming/memory.scala | 25 +
 .../sources/TextSocketMicroBatchStream.scala   |  5 +--
 .../apache/spark/sql/streaming/StreamTest.scala|  8 ++---
 13 files changed, 49 insertions(+), 107 deletions(-)

diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
index d60ee1c..92686d2 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
@@ -76,7 +76,7 @@ class KafkaContinuousStream(
   }
 
   override def planInputPartitions(start: Offset): Array[InputPartition] = {
-val oldStartPartitionOffsets = KafkaSourceOffset.getPartitionOffsets(start)
+val oldStartPartitionOffsets = 
start.asInstanceOf[KafkaSourceOffset].partitionToOffsets
 
 val currentPartitionSet = offsetReader.fetchEarliestOffsets().keySet
 val newPartitions = 
currentPartitionSet.diff(oldStartPartitionOffsets.keySet)
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
index 8d41c0d..90d7043 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
@@ -20,14 +20,14 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => OffsetV2, 
PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset
 
 /**
  * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of 
subscribed topics and
  * their offsets.
  */
 private[kafka010]
-case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) 
extends OffsetV2 {
+case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) 
extends Offset {
 
   override val json = JsonUtils.partitionOffsets(partitionToOffsets)
 }
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
index 43ad4b3..7c167dc 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
@@ -18,44 +18,10 @@
 package org.apache.spark.sql.execution.streaming;
 
 /**
- * This is an internal, deprecated interface. New source implementations 
should use the
- * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the 
one that will be
- * supported in the long term.
+ * This class is an alias of {@link 
org.apache.spark.sql.sources.v2.reader.streaming.Offset}. It's
+ * internal and deprecated. New streaming data source implementations should 
use data source v2 API,
+ * which will be supported in the long term.
  *
  * This class will be removed in a future release.
  */
-public abstract class Offset {
-/**
- * A JSON-serialized representation of an Offset that is
- * used for saving offsets to the