Repository: spark Updated Branches: refs/heads/branch-2.3 911a4dbe7 -> a23c07ecb
[SPARK-21293][SPARKR][DOCS] structured streaming doc update ## What changes were proposed in this pull request? doc update Author: Felix Cheung <felixcheun...@hotmail.com> Closes #20197 from felixcheung/rwadoc. (cherry picked from commit 02214b094390e913f52e71d55c9bb8a81c9e7ef9) Signed-off-by: Felix Cheung <felixche...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a23c07ec Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a23c07ec Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a23c07ec Branch: refs/heads/branch-2.3 Commit: a23c07ecb1dba0dbd52a0d2362d8d21e9cdd8b5a Parents: 911a4db Author: Felix Cheung <felixcheun...@hotmail.com> Authored: Mon Jan 8 22:08:19 2018 -0800 Committer: Felix Cheung <felixche...@apache.org> Committed: Mon Jan 8 22:08:34 2018 -0800 ---------------------------------------------------------------------- R/pkg/vignettes/sparkr-vignettes.Rmd | 2 +- docs/sparkr.md | 2 +- docs/structured-streaming-programming-guide.md | 32 +++++++++++++++++++-- 3 files changed, 32 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/a23c07ec/R/pkg/vignettes/sparkr-vignettes.Rmd ---------------------------------------------------------------------- diff --git a/R/pkg/vignettes/sparkr-vignettes.Rmd b/R/pkg/vignettes/sparkr-vignettes.Rmd index 2e66242..feca617 100644 --- a/R/pkg/vignettes/sparkr-vignettes.Rmd +++ b/R/pkg/vignettes/sparkr-vignettes.Rmd @@ -1042,7 +1042,7 @@ unlink(modelPath) ## Structured Streaming -SparkR supports the Structured Streaming API (experimental). +SparkR supports the Structured Streaming API. You can check the Structured Streaming Programming Guide for [an introduction](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#programming-model) to its programming model and basic concepts. http://git-wip-us.apache.org/repos/asf/spark/blob/a23c07ec/docs/sparkr.md ---------------------------------------------------------------------- diff --git a/docs/sparkr.md b/docs/sparkr.md index 997ea60..6685b58 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -596,7 +596,7 @@ The following example shows how to save/load a MLlib model by SparkR. # Structured Streaming -SparkR supports the Structured Streaming API (experimental). Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. For more information see the R API on the [Structured Streaming Programming Guide](structured-streaming-programming-guide.html) +SparkR supports the Structured Streaming API. Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. For more information see the R API on the [Structured Streaming Programming Guide](structured-streaming-programming-guide.html) # R Function Name Conflicts http://git-wip-us.apache.org/repos/asf/spark/blob/a23c07ec/docs/structured-streaming-programming-guide.md ---------------------------------------------------------------------- diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 31fcfab..de13e28 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -827,8 +827,8 @@ df.isStreaming() {% endhighlight %} </div> <div data-lang="r" markdown="1"> -{% highlight bash %} -Not available. +{% highlight r %} +isStreaming(df) {% endhighlight %} </div> </div> @@ -886,6 +886,19 @@ windowedCounts = words.groupBy( {% endhighlight %} </div> +<div data-lang="r" markdown="1"> +{% highlight r %} +words <- ... # streaming DataFrame of schema { timestamp: Timestamp, word: String } + +# Group the data by window and word and compute the count of each group +windowedCounts <- count( + groupBy( + words, + window(words$timestamp, "10 minutes", "5 minutes"), + words$word)) +{% endhighlight %} + +</div> </div> @@ -960,6 +973,21 @@ windowedCounts = words \ {% endhighlight %} </div> +<div data-lang="r" markdown="1"> +{% highlight r %} +words <- ... # streaming DataFrame of schema { timestamp: Timestamp, word: String } + +# Group the data by window and word and compute the count of each group + +words <- withWatermark(words, "timestamp", "10 minutes") +windowedCounts <- count( + groupBy( + words, + window(words$timestamp, "10 minutes", "5 minutes"), + words$word)) +{% endhighlight %} + +</div> </div> In this example, we are defining the watermark of the query on the value of the column "timestamp", --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org