Repository: kafka
Updated Branches:
  refs/heads/trunk 491774bd5 -> f28fc1100


MINOR: Add Processing Guarantees to Streams docs

Author: Guozhang Wang <wangg...@gmail.com>

Reviewers: Apurva Mehta <apu...@confluent.io>, Matthias J. Sax 
<matth...@confluent.io>, Jason Gustafson <ja...@confluent.io>

Closes #3345 from guozhangwang/KMinor-streams-eos-docs


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/f28fc110
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/f28fc110
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/f28fc110

Branch: refs/heads/trunk
Commit: f28fc1100b5f6e79baeb6b2f2363e86e5cd12306
Parents: 491774b
Author: Guozhang Wang <wangg...@gmail.com>
Authored: Tue Jun 20 16:29:03 2017 -0700
Committer: Jason Gustafson <ja...@confluent.io>
Committed: Tue Jun 20 16:29:08 2017 -0700

----------------------------------------------------------------------
 docs/documentation.html |  6 +++---
 docs/js/templateData.js |  4 ++--
 docs/streams.html       | 41 ++++++++++++++++++++++++++++++++++++-----
 3 files changed, 41 insertions(+), 10 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/f28fc110/docs/documentation.html
----------------------------------------------------------------------
diff --git a/docs/documentation.html b/docs/documentation.html
index 5c70e18..7f297cc 100644
--- a/docs/documentation.html
+++ b/docs/documentation.html
@@ -69,15 +69,15 @@
     <h2><a id="connect" href="#connect">8. Kafka Connect</a></h2>
     <!--#include virtual="connect.html" -->
 
-    <h2><a id="streams" href="/0102/documentation/streams">9. Kafka 
Streams</a></h2>
+    <h2><a id="streams" href="/0110/documentation/streams">9. Kafka 
Streams</a></h2>
     <p>
-        Kafka Streams is a client library for processing and analyzing data 
stored in Kafka and either write the resulting data back to Kafka or send the 
final output to an external system. It builds upon important stream processing 
concepts such as properly distinguishing between event time and processing 
time, windowing support, and simple yet efficient management of application 
state.
+        Kafka Streams is a client library for processing and analyzing data 
stored in Kafka. It builds upon important stream processing concepts such as 
properly distinguishing between event time and processing time, windowing 
support, exactly-once processing semantics and simple yet efficient management 
of application state.
     </p>
     <p>
         Kafka Streams has a <b>low barrier to entry</b>: You can quickly write 
and run a small-scale proof-of-concept on a single machine; and you only need 
to run additional instances of your application on multiple machines to scale 
up to high-volume production workloads. Kafka Streams transparently handles the 
load balancing of multiple instances of the same application by leveraging 
Kafka's parallelism model.
     </p>
 
-    <p>Learn More about Kafka Streams read <a 
href="/0102/documentation/streams">this</a> Section.</p>
+    <p>Learn More about Kafka Streams read <a 
href="/0110/documentation/streams">this</a> Section.</p>
 
 <!--#include virtual="../includes/_footer.htm" -->
 <!--#include virtual="../includes/_docs_footer.htm" -->

http://git-wip-us.apache.org/repos/asf/kafka/blob/f28fc110/docs/js/templateData.js
----------------------------------------------------------------------
diff --git a/docs/js/templateData.js b/docs/js/templateData.js
index b4aedf5..2f32444 100644
--- a/docs/js/templateData.js
+++ b/docs/js/templateData.js
@@ -17,6 +17,6 @@ limitations under the License.
 
 // Define variables for doc templates
 var context={
-    "version": "0102",
-    "dotVersion": "0.10.2"
+    "version": "0110",
+    "dotVersion": "0.11.0"
 };
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kafka/blob/f28fc110/docs/streams.html
----------------------------------------------------------------------
diff --git a/docs/streams.html b/docs/streams.html
index e59c386..1f1adb1 100644
--- a/docs/streams.html
+++ b/docs/streams.html
@@ -46,19 +46,22 @@
         <h2><a id="streams_overview" href="#streams_overview">Overview</a></h2>
 
         <p>
-        Kafka Streams is a client library for processing and analyzing data 
stored in Kafka and either write the resulting data back to Kafka or send the 
final output to an external system. It builds upon important stream processing 
concepts such as properly distinguishing between event time and processing 
time, windowing support, and simple yet efficient management of application 
state.
+            Kafka Streams is a client library for processing and analyzing 
data stored in Kafka.
+            It builds upon important stream processing concepts such as 
properly distinguishing between event time and processing time, windowing 
support, and simple yet efficient management of application state.
         </p>
         <p>
-        Kafka Streams has a <b>low barrier to entry</b>: You can quickly write 
and run a small-scale proof-of-concept on a single machine; and you only need 
to run additional instances of your application on multiple machines to scale 
up to high-volume production workloads. Kafka Streams transparently handles the 
load balancing of multiple instances of the same application by leveraging 
Kafka's parallelism model.
+            Kafka Streams has a <b>low barrier to entry</b>: You can quickly 
write and run a small-scale proof-of-concept on a single machine; and you only 
need to run additional instances of your application on multiple machines to 
scale up to high-volume production workloads.
+            Kafka Streams transparently handles the load balancing of multiple 
instances of the same application by leveraging Kafka's parallelism model.
         </p>
         <p>
-        Some highlights of Kafka Streams:
+            Some highlights of Kafka Streams:
         </p>
 
         <ul>
             <li>Designed as a <b>simple and lightweight client library</b>, 
which can be easily embedded in any Java application and integrated with any 
existing packaging, deployment and operational tools that users have for their 
streaming applications.</li>
             <li>Has <b>no external dependencies on systems other than Apache 
Kafka itself</b> as the internal messaging layer; notably, it uses Kafka's 
partitioning model to horizontally scale processing while maintaining strong 
ordering guarantees.</li>
             <li>Supports <b>fault-tolerant local state</b>, which enables very 
fast and efficient stateful operations like windowed joins and 
aggregations.</li>
+            <li>Supports <b>exactly-once</b> processing semantics to guarantee 
that each record will be processed once and only once even when there is a 
failure on either Streams clients or Kafka brokers in the middle of 
processing.</li>
             <li>Employs <b>one-record-at-a-time processing</b> to achieve 
millisecond processing latency, and supports <b>event-time based windowing 
operations</b> with late arrival of records.</li>
             <li>Offers necessary stream processing primitives, along with a 
<b>high-level Streams DSL</b> and a <b>low-level Processor API</b>.</li>
 
@@ -86,6 +89,8 @@
             <li><b>Sink Processor</b>: A sink processor is a special type of 
stream processor that does not have down-stream processors. It sends any 
received records from its up-stream processors to a specified Kafka topic.</li>
         </ul>
 
+        Note that in normal processor nodes other remote systems can also be 
accessed while processing the current record. Therefore the processed results 
can either be streamed back into Kafka or written to an external system.
+
         <img class="centered" 
src="/{{version}}/images/streams-architecture-topology.jpg" style="width:400px">
 
         <p>
@@ -158,6 +163,27 @@
         </p>
         <br>
 
+        <h2><a id="streams_processing_guarantee" 
href="#streams_processing_guarantee">Processing Guarantees</a></h2>
+
+        <p>
+            In stream processing, one of the most frequently asked question is 
"does my stream processing system guarantee that each record is processed once 
and only once, even if some failures are encountered in the middle of 
processing?"
+            Failing to guarantee exactly-once stream processing is a 
deal-breaker for many applications that cannot tolerate any data-loss or data 
duplicates, and in that case a batch-oriented framework is usually used in 
addition
+            to the stream processing pipeline, known as the <a 
href="http://lambda-architecture.net/";>Lambda Architecture</a>.
+            Prior to 0.11.0.0, Kafka only provides at-least-once delivery 
guarantees and hence any stream processing systems that leverage it as the 
backend storage could not guarantee end-to-end exactly-once semantics.
+            In fact, even for those stream processing systems that claim to 
support exactly-once processing, as long as they are reading from / writing to 
Kafka as the source / sink, their applications cannot actually guarantee that
+            no duplicates will be generated throughout the pipeline.
+
+            Since the 0.11.0.0 release, Kafka has added support to allow its 
producers to send messages to different topic partitions in a <a 
href="https://kafka.apache.org/documentation/#semantics";>transactional and 
idempotent manner</a>,
+            and Kafka Streams has hence added the end-to-end exactly-once 
processing semantics by leveraging these features.
+            More specifically, it guarantees that for any record read from the 
source Kafka topics, its processing results will be reflected exactly once in 
the output Kafka topic as well as in the state stores for stateful operations.
+            Note the key difference between Kafka Streams end-to-end 
exactly-once guarantee with other stream processing frameworks' claimed 
guarantees is that Kafka Streams tightly integrates with the underlying Kafka 
storage system and ensure that
+            commits on the input topic offsets, updates on the state stores, 
and writes to the output topics will be completed atomically instead of 
treating Kafka as an external system that may have side-effects.
+            To read more details on how this is done inside Kafka Streams, 
readers are recommended to read <a 
href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics";>KIP-129</a>.
+
+            In order to achieve exactly-once semantics when running Kafka 
Streams applications, users can simply set the 
<code>processing.guarantee</code> config value to <b>exactly_once</b> (default 
value is <b>at_least_once</b>).
+            More details can be found in the <a href="#streamsconfigs">Kafka 
Streams Configs</a> section.
+        </p>
+
         <h2><a id="streams_architecture" 
href="#streams_architecture">Architecture</a></h2>
 
         Kafka Streams simplifies application development by building on the 
Kafka producer and consumer libraries and leveraging the native capabilities of
@@ -685,7 +711,11 @@ Properties settings = new Properties();
 // Set a few key parameters
 settings.put(StreamsConfig.APPLICATION_ID_CONFIG, 
"my-first-streams-application");
 settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
-settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "zookeeper1:2181");
+
+// Set a few user customized parameters
+settings.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, 
StreamsConfig.EXACTLY_ONCE);
+settings.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, 
MyTimestampExtractor.class);
+
 // Any further settings
 settings.put(... , ...);
 
@@ -718,7 +748,7 @@ settings.put("consumer." + 
ConsumerConfig.RECEIVE_BUFFER_CONFIG, 1024 * 1024);
 settings.put("producer." + ProducerConfig.RECEIVE_BUFFER_CONFIG, 64 * 1024);
 // Alternatively, you can use
 
settings.put(StreamsConfig.consumerPrefix(ConsumerConfig.RECEIVE_BUFFER_CONFIG),
 1024 * 1024);
-settings.put(StremasConfig.producerConfig(ProducerConfig.RECEIVE_BUFFER_CONFIG),
 64 * 1024);
+settings.put(StreamsConfig.producerConfig(ProducerConfig.RECEIVE_BUFFER_CONFIG),
 64 * 1024);
 </pre>
 
         <h4><a id="streams_broker_config" href="#streams_broker_config">Broker 
Configuration</a></h4>
@@ -848,6 +878,7 @@ $ java -cp path-to-app-fatjar.jar com.example.MyStreamsApp
 
         <p> Updates in <code>StreamsConfig</code>: </p>
         <ul>
+          <li> new configuration parameter <code>processing.guarantee</code> 
is added </li>
           <li> configuration parameter <code>key.serde</code> was deprecated 
and replaced by <code>default.key.serde</code> </li>
           <li> configuration parameter <code>value.serde</code> was deprecated 
and replaced by <code>default.value.serde</code> </li>
           <li> configuration parameter <code>timestamp.extractor</code> was 
deprecated and replaced by <code>default.timestamp.extractor</code> </li>

Reply via email to