Repository: spark
Updated Branches:
  refs/heads/master dd85eb544 -> 623fc7fc6


[MINOR][DOC] Remove spaces following slashs

## What changes were proposed in this pull request?

This PR merges multiple lines enumerating items in order to remove the 
redundant spaces following slashes in [Structured Streaming Programming Guide 
in 
2.0.2-rc1](http://people.apache.org/~pwendell/spark-releases/spark-2.0.2-rc1-docs/structured-streaming-programming-guide.html).
- Before: `Scala/ Java/ Python`
- After: `Scala/Java/Python`
## How was this patch tested?

Manual by the followings because this is documentation update.

```
cd docs
SKIP_API=1 jekyll build
```

Author: Dongjoon Hyun <dongj...@apache.org>

Closes #15686 from dongjoon-hyun/minor_doc_space.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/623fc7fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/623fc7fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/623fc7fc

Branch: refs/heads/master
Commit: 623fc7fc67735cfafdb7f527bd3df210987943c6
Parents: dd85eb5
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Tue Nov 1 13:08:49 2016 +0000
Committer: Sean Owen <so...@cloudera.com>
Committed: Tue Nov 1 13:08:49 2016 +0000

----------------------------------------------------------------------
 docs/structured-streaming-programming-guide.md | 44 ++++++++++-----------
 1 file changed, 20 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/623fc7fc/docs/structured-streaming-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 173fd6e..d838ed3 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -14,10 +14,8 @@ Structured Streaming is a scalable and fault-tolerant stream 
processing engine b
 
 # Quick Example
 Let’s say you want to maintain a running word count of text data received 
from a data server listening on a TCP socket. Let’s see how you can express 
this using Structured Streaming. You can see the full code in 
-[Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCount.scala)/
-[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java)/
-[Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount.py).
 And if you 
-[download Spark](http://spark.apache.org/downloads.html), you can directly run 
the example. In any case, let’s walk through the example step-by-step and 
understand how it works. First, we have to import the necessary classes and 
create a local SparkSession, the starting point of all functionalities related 
to Spark.
+[Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCount.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCount.java)/[Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount.py).
+And if you [download Spark](http://spark.apache.org/downloads.html), you can 
directly run the example. In any case, let’s walk through the example 
step-by-step and understand how it works. First, we have to import the 
necessary classes and create a local SparkSession, the starting point of all 
functionalities related to Spark.
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
@@ -409,16 +407,15 @@ Delivering end-to-end exactly-once semantics was one of 
key goals behind the des
 to track the read position in the stream. The engine uses checkpointing and 
write ahead logs to record the offset range of the data being processed in each 
trigger. The streaming sinks are designed to be idempotent for handling 
reprocessing. Together, using replayable sources and idempotent sinks, 
Structured Streaming can ensure **end-to-end exactly-once semantics** under any 
failure.
 
 # API using Datasets and DataFrames
-Since Spark 2.0, DataFrames and Datasets can represent static, bounded data, 
as well as streaming, unbounded data. Similar to static Datasets/DataFrames, 
you can use the common entry point `SparkSession` 
([Scala](api/scala/index.html#org.apache.spark.sql.SparkSession)/
-[Java](api/java/org/apache/spark/sql/SparkSession.html)/
-[Python](api/python/pyspark.sql.html#pyspark.sql.SparkSession) docs) to create 
streaming DataFrames/Datasets from streaming sources, and apply the same 
operations on them as static DataFrames/Datasets. If you are not familiar with 
Datasets/DataFrames, you are strongly advised to familiarize yourself with them 
using the 
+Since Spark 2.0, DataFrames and Datasets can represent static, bounded data, 
as well as streaming, unbounded data. Similar to static Datasets/DataFrames, 
you can use the common entry point `SparkSession`
+([Scala](api/scala/index.html#org.apache.spark.sql.SparkSession)/[Java](api/java/org/apache/spark/sql/SparkSession.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.SparkSession)
 docs)
+to create streaming DataFrames/Datasets from streaming sources, and apply the 
same operations on them as static DataFrames/Datasets. If you are not familiar 
with Datasets/DataFrames, you are strongly advised to familiarize yourself with 
them using the
 [DataFrame/Dataset Programming Guide](sql-programming-guide.html).
 
 ## Creating streaming DataFrames and streaming Datasets
 Streaming DataFrames can be created through the `DataStreamReader` interface 
-([Scala](api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader)/
-[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/
-[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader) 
docs) returned by `SparkSession.readStream()`. Similar to the read interface 
for creating static DataFrame, you can specify the details of the source – 
data format, schema, options, etc.
+([Scala](api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader)
 docs)
+returned by `SparkSession.readStream()`. Similar to the read interface for 
creating static DataFrame, you can specify the details of the source – data 
format, schema, options, etc.
 
 #### Data Sources
 In Spark 2.0, there are a few built-in sources.
@@ -628,9 +625,7 @@ The result tables would look something like the following.
 ![Window Operations](img/structured-streaming-window.png)
 
 Since this windowing is similar to grouping, in code, you can use `groupBy()` 
and `window()` operations to express windowed aggregations. You can see the 
full code for the below examples in
-[Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/
-[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/
-[Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
+[Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java)/[Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py).
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
@@ -753,10 +748,9 @@ In addition, there are some Dataset methods that will not 
work on streaming Data
 If you try any of these operations, you will see an AnalysisException like 
"operation XYZ is not supported with streaming DataFrames/Datasets".
 
 ## Starting Streaming Queries
-Once you have defined the final result DataFrame/Dataset, all that is left is 
for you start the streaming computation. To do that, you have to use the 
-`DataStreamWriter` 
([Scala](api/scala/index.html#org.apache.spark.sql.streaming.DataStreamWriter)/
-[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/
-[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamWriter) 
docs) returned through `Dataset.writeStream()`. You will have to specify one or 
more of the following in this interface.
+Once you have defined the final result DataFrame/Dataset, all that is left is 
for you start the streaming computation. To do that, you have to use the 
`DataStreamWriter`
+([Scala](api/scala/index.html#org.apache.spark.sql.streaming.DataStreamWriter)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamWriter)
 docs)
+returned through `Dataset.writeStream()`. You will have to specify one or more 
of the following in this interface.
 
 - *Details of the output sink:* Data format, location, etc. 
 
@@ -953,8 +947,9 @@ spark.sql("select * from aggregates").show()   # 
interactively query in-memory t
 </div>
 
 #### Using Foreach
-The `foreach` operation allows arbitrary operations to be computed on the 
output data. As of Spark 2.0, this is available only for Scala and Java. To use 
this, you will have to implement the interface `ForeachWriter` 
([Scala](api/scala/index.html#org.apache.spark.sql.ForeachWriter)/
-[Java](api/java/org/apache/spark/sql/ForeachWriter.html) docs), which has 
methods that get called whenever there is a sequence of rows generated as 
output after a trigger. Note the following important points.
+The `foreach` operation allows arbitrary operations to be computed on the 
output data. As of Spark 2.0, this is available only for Scala and Java. To use 
this, you will have to implement the interface `ForeachWriter`
+([Scala](api/scala/index.html#org.apache.spark.sql.ForeachWriter)/[Java](api/java/org/apache/spark/sql/ForeachWriter.html)
 docs),
+which has methods that get called whenever there is a sequence of rows 
generated as output after a trigger. Note the following important points.
 
 - The writer must be serializable, as it will be serialized and sent to the 
executors for execution.
 
@@ -1046,9 +1041,9 @@ query.sinkStatus()   # progress information about data 
written to the output sin
 </div>
 </div>
 
-You can start any number of queries in a single SparkSession. They will all be 
running concurrently sharing the cluster resources. You can use 
`sparkSession.streams()` to get the `StreamingQueryManager` 
([Scala](api/scala/index.html#org.apache.spark.sql.streaming.StreamingQueryManager)/
-[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryManager.html)/
-[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager)
 docs) that can be used to manage the currently active queries.
+You can start any number of queries in a single SparkSession. They will all be 
running concurrently sharing the cluster resources. You can use 
`sparkSession.streams()` to get the `StreamingQueryManager`
+([Scala](api/scala/index.html#org.apache.spark.sql.streaming.StreamingQueryManager)/[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryManager.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager)
 docs)
+that can be used to manage the currently active queries.
 
 <div class="codetabs">
 <div data-lang="scala"  markdown="1">
@@ -1092,8 +1087,9 @@ spark.streams().awaitAnyTermination()  # block until any 
one of them terminates
 </div>
 </div>
 
-Finally, for asynchronous monitoring of streaming queries, you can create and 
attach a `StreamingQueryListener` 
([Scala](api/scala/index.html#org.apache.spark.sql.streaming.StreamingQueryListener)/
-[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html) 
docs), which will give you regular callback-based updates when queries are 
started and terminated.
+Finally, for asynchronous monitoring of streaming queries, you can create and 
attach a `StreamingQueryListener`
+([Scala](api/scala/index.html#org.apache.spark.sql.streaming.StreamingQueryListener)/[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryListener.html)
 docs),
+which will give you regular callback-based updates when queries are started 
and terminated.
 
 ## Recovering from Failures with Checkpointing 
 In case of a failure or intentional shutdown, you can recover the previous 
progress and state of a previous query, and continue where it left off. This is 
done using checkpointing and write ahead logs. You can configure a query with a 
checkpoint location, and the query will save all the progress information (i.e. 
range of offsets processed in each trigger) and the running aggregates (e.g. 
word counts in the [quick example](#quick-example)) to the checkpoint location. 
As of Spark 2.0, this checkpoint location has to be a path in an HDFS 
compatible file system, and can be set as an option in the DataStreamWriter 
when [starting a query](#starting-streaming-queries). 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to