spark git commit: [SPARK-16114][SQL] updated structured streaming guide

tdas Wed, 13 Jul 2016 13:26:52 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7de183d97 -> 86adc5cfb



[SPARK-16114][SQL] updated structured streaming guide

## What changes were proposed in this pull request?

Updated structured streaming programming guide with new windowed example.

## How was this patch tested?

Docs

Author: James Thomas <jamesjoetho...@gmail.com>

Closes #14183 from jjthomas/ss_docs_update.

(cherry picked from commit 51a6706b1339bb761602e33276a469f71be2cd90)
Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/86adc5cf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/86adc5cf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/86adc5cf

Branch: refs/heads/branch-2.0
Commit: 86adc5cfbe286eb4d6071ec9ee09b6d0960a8509
Parents: 7de183d
Author: James Thomas <jamesjoetho...@gmail.com>
Authored: Wed Jul 13 13:26:23 2016 -0700
Committer: Tathagata Das <tathagata.das1...@gmail.com>
Committed: Wed Jul 13 13:26:34 2016 -0700

----------------------------------------------------------------------
 docs/structured-streaming-programming-guide.md | 49 ++++++++++-----------
 1 file changed, 23 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/86adc5cf/docs/structured-streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 7949396..3ef39e4 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -626,52 +626,49 @@ The result tables would look something like the following.
 
 ![Window Operations](img/structured-streaming-window.png)
 
-Since this windowing is similar to grouping, in code, you can use `groupBy()` 
and `window()` operations to express windowed aggregations.
+Since this windowing is similar to grouping, in code, you can use `groupBy()` 
and `window()` operations to express windowed aggregations. You can see the 
full code for the below examples in
+[Scala]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
+[Java]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
+[Python]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
 
 {% highlight scala %}
-// Number of events in every 1 minute time windows
-df.groupBy(window(df.col("time"), "1 minute"))
-  .count()
+import spark.implicits._
 
+val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: 
String }
 
-// Average number of events for each device type in every 1 minute time windows
-df.groupBy(
-     df.col("type"),
-     window(df.col("time"), "1 minute"))
-  .avg("signal")
+// Group the data by window and word and compute the count of each group
+val windowedCounts = words.groupBy(
+  window($"timestamp", "10 minutes", "5 minutes"),
+  $"word"
+).count()
 {% endhighlight %}
 
 </div>
 <div data-lang="java"  markdown="1">
 
 {% highlight java %}
-import static org.apache.spark.sql.functions.window;
-
-// Number of events in every 1 minute time windows
-df.groupBy(window(df.col("time"), "1 minute"))
-  .count();
-
-// Average number of events for each device type in every 1 minute time windows
-df.groupBy(
-     df.col("type"),
-     window(df.col("time"), "1 minute"))
-  .avg("signal");
+Dataset<Row> words = ... // streaming DataFrame of schema { timestamp: 
Timestamp, word: String }
 
+// Group the data by window and word and compute the count of each group
+Dataset<Row> windowedCounts = words.groupBy(
+  functions.window(words.col("timestamp"), "10 minutes", "5 minutes"),
+  words.col("word")
+).count();
 {% endhighlight %}
 
 </div>
 <div data-lang="python"  markdown="1">
 {% highlight python %}
-from pyspark.sql.functions import window
-
-# Number of events in every 1 minute time windows
-df.groupBy(window("time", "1 minute")).count()
+words = ... # streaming DataFrame of schema { timestamp: Timestamp, word: 
String }
 
-# Average number of events for each device type in every 1 minute time windows
-df.groupBy("type", window("time", "1 minute")).avg("signal")
+# Group the data by window and word and compute the count of each group
+windowedCounts = words.groupBy(
+    window(words.timestamp, '10 minutes', '5 minutes'),
+    words.word
+).count()
 {% endhighlight %}
 
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-16114][SQL] updated structured streaming guide

Reply via email to