Modified: samza/site/learn/documentation/latest/connectors/kinesis.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/connectors/kinesis.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/connectors/kinesis.html (original)
+++ samza/site/learn/documentation/latest/connectors/kinesis.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/connectors/kinesis">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/connectors/kinesis">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/connectors/kinesis">1.6.0</a></li>
 
               
@@ -639,15 +653,14 @@
    limitations under the License.
 -->
 
-<h3 id="kinesis-i-o-quickstart">Kinesis I/O: Quickstart</h3>
+<h3 id="kinesis-io-quickstart">Kinesis I/O: Quickstart</h3>
 
 <p>The Samza Kinesis connector allows you to interact with <a 
href="https://aws.amazon.com/kinesis/data-streams";>Amazon Kinesis Data 
Streams</a>,
-Amazon’s data streaming service. The <code>hello-samza</code> project 
includes an example of processing Kinesis streams using Samza. Here is the 
complete <a 
href="https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/kinesis/KinesisHelloSamza.java";>source
 code</a> and <a 
href="https://github.com/apache/samza-hello-samza/blob/master/src/main/config/kinesis-hello-samza.properties";>configs</a>.
+Amazon’s data streaming service. The <code class="language-plaintext 
highlighter-rouge">hello-samza</code> project includes an example of processing 
Kinesis streams using Samza. Here is the complete <a 
href="https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/kinesis/KinesisHelloSamza.java";>source
 code</a> and <a 
href="https://github.com/apache/samza-hello-samza/blob/master/src/main/config/kinesis-hello-samza.properties";>configs</a>.
 You can build and run this example using this <a 
href="https://github.com/apache/samza-hello-samza#hello-samza";>tutorial</a>.</p>
 
-<h3 id="data-format">Data Format</h3>
-
-<p>Like a Kafka topic, a Kinesis stream can have multiple shards with 
producers and consumers.
+<p>###Data Format
+Like a Kafka topic, a Kinesis stream can have multiple shards with producers 
and consumers.
 Each message consumed from the stream is an instance of a Kinesis <a 
href="http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record";>Record</a>.
 Samza’s <a 
href="https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisSystemConsumer.java";>KinesisSystemConsumer</a>
 wraps the Record into a <a 
href="https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java";>KinesisIncomingMessageEnvelope</a>.</p>
@@ -656,121 +669,118 @@ wraps the Record into a <a href="https:/
 
 <h4 id="basic-configuration">Basic Configuration</h4>
 
-<p>Here is the required configuration for consuming messages from Kinesis, 
through <code>KinesisSystemDescriptor</code> and 
<code>KinesisInputDescriptor</code>. </p>
+<p>Here is the required configuration for consuming messages from Kinesis, 
through <code class="language-plaintext 
highlighter-rouge">KinesisSystemDescriptor</code> and <code 
class="language-plaintext highlighter-rouge">KinesisInputDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="n">KinesisSystemDescriptor</span> 
<span class="n">ksd</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="s">&quot;kinesis&quot;</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisSystemDescriptor</span> <span 
class="n">ksd</span> <span class="o">=</span> <span class="k">new</span> <span 
class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="s">"kinesis"</span><span class="o">);</span>
     
-<span class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> 
-    <span class="n">ksd</span><span class="o">.</span><span 
class="na">getInputDescriptor</span><span class="o">(</span><span 
class="s">&quot;STREAM-NAME&quot;</span><span class="o">,</span> <span 
class="k">new</span> <span class="n">NoOpSerde</span><span 
class="o">&lt;</span><span class="kt">byte</span><span 
class="o">[]&gt;())</span>
-          <span class="o">.</span><span class="na">withRegion</span><span 
class="o">(</span><span class="s">&quot;STREAM-REGION&quot;</span><span 
class="o">)</span>
-          <span class="o">.</span><span class="na">withAccessKey</span><span 
class="o">(</span><span class="s">&quot;YOUR-ACCESS_KEY&quot;</span><span 
class="o">)</span>
-          <span class="o">.</span><span class="na">withSecretKey</span><span 
class="o">(</span><span class="s">&quot;YOUR-SECRET-KEY&quot;</span><span 
class="o">);</span></code></pre></figure>
-
-<h4 id="coordination">Coordination</h4>
+<span class="nc">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> 
+    <span class="n">ksd</span><span class="o">.</span><span 
class="na">getInputDescriptor</span><span class="o">(</span><span 
class="s">"STREAM-NAME"</span><span class="o">,</span> <span 
class="k">new</span> <span class="nc">NoOpSerde</span><span 
class="o">&lt;</span><span class="kt">byte</span><span 
class="o">[]&gt;())</span>
+          <span class="o">.</span><span class="na">withRegion</span><span 
class="o">(</span><span class="s">"STREAM-REGION"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withAccessKey</span><span 
class="o">(</span><span class="s">"YOUR-ACCESS_KEY"</span><span 
class="o">)</span>
+          <span class="o">.</span><span class="na">withSecretKey</span><span 
class="o">(</span><span class="s">"YOUR-SECRET-KEY"</span><span 
class="o">);</span></code></pre></figure>
 
-<p>The Kinesis system consumer does not rely on Samza&rsquo;s coordination 
mechanism. Instead, it uses the Kinesis client library (KCL) for coordination 
and distributing available shards among available instances. Hence, you should
-set your <code>grouper</code> configuration to 
<code>AllSspToSingleTaskGrouperFactory</code>.</p>
+<p>####Coordination
+The Kinesis system consumer does not rely on Samza’s coordination mechanism. 
Instead, it uses the Kinesis client library (KCL) for coordination and 
distributing available shards among available instances. Hence, you should
+set your <code class="language-plaintext highlighter-rouge">grouper</code> 
configuration to <code class="language-plaintext 
highlighter-rouge">AllSspToSingleTaskGrouperFactory</code>.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span 
class="na">job.systemstreampartition.grouper.factory</span><span 
class="o">=</span><span 
class="s">org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties">job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory</code></pre></figure>
 
-<h4 id="security">Security</h4>
+<p>####Security</p>
 
-<p>Each Kinesis stream in a given AWS <a 
href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html";>region</a>
 can be accessed by providing an <a 
href="https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys";>access
 key</a>. An Access key consists of two parts: an access key ID (for example, 
<code>AKIAIOSFODNN7EXAMPLE</code>) and a secret access key (for example, 
<code>wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</code>) which you can use to 
send programmatic requests to AWS. </p>
+<p>Each Kinesis stream in a given AWS <a 
href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html";>region</a>
 can be accessed by providing an <a 
href="https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys";>access
 key</a>. An Access key consists of two parts: an access key ID (for example, 
<code class="language-plaintext highlighter-rouge">AKIAIOSFODNN7EXAMPLE</code>) 
and a secret access key (for example, <code class="language-plaintext 
highlighter-rouge">wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</code>) which you 
can use to send programmatic requests to AWS.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> 
-    <span class="n">ksd</span><span class="o">.</span><span 
class="na">getInputDescriptor</span><span class="o">(</span><span 
class="s">&quot;STREAM-NAME&quot;</span><span class="o">,</span> <span 
class="k">new</span> <span class="n">NoOpSerde</span><span 
class="o">&lt;</span><span class="kt">byte</span><span 
class="o">[]&gt;())</span>
-          <span class="o">.</span><span class="na">withRegion</span><span 
class="o">(</span><span class="s">&quot;STREAM-REGION&quot;</span><span 
class="o">)</span>
-          <span class="o">.</span><span class="na">withAccessKey</span><span 
class="o">(</span><span class="s">&quot;YOUR-ACCESS_KEY&quot;</span><span 
class="o">)</span>
-          <span class="o">.</span><span class="na">withSecretKey</span><span 
class="o">(</span><span class="s">&quot;YOUR-SECRET-KEY&quot;</span><span 
class="o">);</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisInputDescriptor</span><span 
class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> 
+    <span class="n">ksd</span><span class="o">.</span><span 
class="na">getInputDescriptor</span><span class="o">(</span><span 
class="s">"STREAM-NAME"</span><span class="o">,</span> <span 
class="k">new</span> <span class="nc">NoOpSerde</span><span 
class="o">&lt;</span><span class="kt">byte</span><span 
class="o">[]&gt;())</span>
+          <span class="o">.</span><span class="na">withRegion</span><span 
class="o">(</span><span class="s">"STREAM-REGION"</span><span class="o">)</span>
+          <span class="o">.</span><span class="na">withAccessKey</span><span 
class="o">(</span><span class="s">"YOUR-ACCESS_KEY"</span><span 
class="o">)</span>
+          <span class="o">.</span><span class="na">withSecretKey</span><span 
class="o">(</span><span class="s">"YOUR-SECRET-KEY"</span><span 
class="o">);</span></code></pre></figure>
 
 <h3 id="advanced-configuration">Advanced Configuration</h3>
 
 <h4 id="kinesis-client-library-configs">Kinesis Client Library Configs</h4>
-
 <p>Samza Kinesis Connector uses the <a 
href="https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html#kinesis-record-processor-overview-kcl";>Kinesis
 Client Library</a>
 (KCL) to access the Kinesis data streams. You can set any <a 
href="https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java";>KCL
 Configuration</a>
-for a stream by configuring it through <code>KinesisInputDescriptor</code>.</p>
+for a stream by configuring it through <code class="language-plaintext 
highlighter-rouge">KinesisInputDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisInputDescriptor</span><span 
class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="n">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">&quot;CONFIG-PARAM&quot;</span><span class="o">,</span> <span 
class="s">&quot;CONFIG-VALUE&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">"CONFIG-PARAM"</span><span class="o">,</span> <span 
class="s">"CONFIG-VALUE"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span 
class="na">withKCLConfig</span><span class="o">(</span><span 
class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>As an example, the below configuration is equivalent to invoking 
<code>kclClient#WithTableName(myTable)</code> on the KCL instance.</p>
+<p>As an example, the below configuration is equivalent to invoking <code 
class="language-plaintext 
highlighter-rouge">kclClient#WithTableName(myTable)</code> on the KCL 
instance.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisInputDescriptor</span><span 
class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="n">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">&quot;TableName&quot;</span><span class="o">,</span> <span 
class="s">&quot;myTable&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">"TableName"</span><span class="o">,</span> <span 
class="s">"myTable"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span 
class="na">withKCLConfig</span><span class="o">(</span><span 
class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
 <h4 id="aws-client-configs">AWS Client configs</h4>
-
 <p>Samza allows you to specify any <a 
href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html";>AWS
 client configs</a> to connect to your Kinesis instance.
-You can configure any <a 
href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html";>AWS
 client configuration</a> through <code>KinesisSystemDescriptor</code>.</p>
+You can configure any <a 
href="http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html";>AWS
 client configuration</a> through <code class="language-plaintext 
highlighter-rouge">KinesisSystemDescriptor</code>.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="n">Map</span><span 
class="o">&lt;</span><span class="n">String</span><span class="o">,</span> 
<span class="n">String</span><span class="o">&gt;</span> <span 
class="n">awsConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">awsConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">&quot;CONFIG-PARAM&quot;</span><span class="o">,</span> <span 
class="s">&quot;CONFIG-VALUE&quot;</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">awsConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">awsConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">"CONFIG-PARAM"</span><span class="o">,</span> <span 
class="s">"CONFIG-VALUE"</span><span class="o">);</span>
 
-<span class="n">KinesisSystemDescriptor</span> <span class="n">sd</span> <span 
class="o">=</span> <span class="k">new</span> <span 
class="n">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="n">systemName</span><span class="o">)</span>
+<span class="nc">KinesisSystemDescriptor</span> <span class="n">sd</span> 
<span class="o">=</span> <span class="k">new</span> <span 
class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="n">systemName</span><span class="o">)</span>
                                           <span class="o">.</span><span 
class="na">withAWSConfig</span><span class="o">(</span><span 
class="n">awsConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>Through <code>KinesisSystemDescriptor</code> you can also set the <em>proxy 
host</em> and <em>proxy port</em> to be used by the Kinesis Client:</p>
+<p>Through <code class="language-plaintext 
highlighter-rouge">KinesisSystemDescriptor</code> you can also set the 
<em>proxy host</em> and <em>proxy port</em> to be used by the Kinesis 
Client:</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="n">KinesisSystemDescriptor</span> 
<span class="n">sd</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="n">systemName</span><span class="o">)</span>
-                                          <span class="o">.</span><span 
class="na">withProxyHost</span><span class="o">(</span><span 
class="s">&quot;YOUR-PROXY-HOST&quot;</span><span class="o">)</span>
-                                          <span class="o">.</span><span 
class="na">withProxyPort</span><span class="o">(</span><span 
class="n">YOUR</span><span class="o">-</span><span class="n">PROXY</span><span 
class="o">-</span><span class="n">PORT</span><span 
class="o">);</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisSystemDescriptor</span> <span 
class="n">sd</span> <span class="o">=</span> <span class="k">new</span> <span 
class="nc">KinesisSystemDescriptor</span><span class="o">(</span><span 
class="n">systemName</span><span class="o">)</span>
+                                          <span class="o">.</span><span 
class="na">withProxyHost</span><span class="o">(</span><span 
class="s">"YOUR-PROXY-HOST"</span><span class="o">)</span>
+                                          <span class="o">.</span><span 
class="na">withProxyPort</span><span class="o">(</span><span 
class="no">YOUR</span><span class="o">-</span><span 
class="no">PROXY</span><span class="o">-</span><span 
class="no">PORT</span><span class="o">);</span></code></pre></figure>
 
 <h3 id="resetting-offsets">Resetting Offsets</h3>
 
 <p>Unlike other connectors where Samza stores and manages checkpointed 
offsets, Kinesis checkpoints are stored in a <a 
href="https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html";>DynamoDB</a>
 table.
-These checkpoints are stored and managed by the KCL library internally. You 
can reset the checkpoints by configuring a different name for the DynamoDB 
table. </p>
+These checkpoints are stored and managed by the KCL library internally. You 
can reset the checkpoints by configuring a different name for the DynamoDB 
table.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c">// change the TableName to 
a unique name to reset checkpoints.</span>
-<span 
class="na">systems.kinesis-system.streams.STREAM-NAME.aws.kcl.TableName</span><span
 class="o">=</span><span 
class="s">my-app-table-name</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties">// change the TableName to a unique name to reset 
checkpoints.
+systems.kinesis-system.streams.STREAM-NAME.aws.kcl.TableName=my-app-table-name</code></pre></figure>
 
-<p>Or through <code>KinesisInputDescriptor</code></p>
+<p>Or through <code class="language-plaintext 
highlighter-rouge">KinesisInputDescriptor</code></p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisInputDescriptor</span><span 
class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="n">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">&quot;TableName&quot;</span><span class="o">,</span> <span 
class="s">&quot;my-new-app-table-name&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">"TableName"</span><span class="o">,</span> <span 
class="s">"my-new-app-table-name"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span 
class="na">withKCLConfig</span><span class="o">(</span><span 
class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
-<p>When you reset checkpoints, you can configure your job to start consuming 
from either the earliest or latest offset in the stream.  </p>
+<p>When you reset checkpoints, you can configure your job to start consuming 
from either the earliest or latest offset in the stream.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c">// set the starting 
position to either TRIM_HORIZON (oldest) or LATEST (latest)</span>
-<span 
class="na">systems.kinesis-system.streams.STREAM-NAME.aws.kcl.InitialPositionInStream</span><span
 class="o">=</span><span class="s">LATEST</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties">// set the starting position to either TRIM_HORIZON 
(oldest) or LATEST (latest)
+systems.kinesis-system.streams.STREAM-NAME.aws.kcl.InitialPositionInStream=LATEST</code></pre></figure>
 
-<p>Or through <code>KinesisInputDescriptor</code></p>
+<p>Or through <code class="language-plaintext 
highlighter-rouge">KinesisInputDescriptor</code></p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="n">KinesisInputDescriptor</span><span class="o">&lt;</span><span 
class="n">KV</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="nc">KinesisInputDescriptor</span><span 
class="o">&lt;</span><span class="no">KV</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="kt">byte</span><span class="o">[]&gt;&gt;</span> <span 
class="n">kid</span> <span class="o">=</span> <span class="o">...</span>
 
-<span class="n">Map</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span> <span 
class="n">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="n">HashMap</span><span class="o">&lt;&gt;;</span>
-<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">&quot;InitialPositionInStream&quot;</span><span class="o">,</span> 
<span class="s">&quot;LATEST&quot;</span><span class="o">);</span>
+<span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">String</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">kclConfig</span> <span class="o">=</span> <span class="k">new</span> 
<span class="nc">HashMap</span><span class="o">&lt;&gt;;</span>
+<span class="n">kclConfig</span><span class="o">.</span><span 
class="na">put</span><span class="o">(</span><span 
class="s">"InitialPositionInStream"</span><span class="o">,</span> <span 
class="s">"LATEST"</span><span class="o">);</span>
 
 <span class="n">kid</span><span class="o">.</span><span 
class="na">withKCLConfig</span><span class="o">(</span><span 
class="n">kclConfig</span><span class="o">);</span></code></pre></figure>
 
 <p>Alternately, if you want to start from a particular offset in the Kinesis 
stream, you can login to the <a 
href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ConsoleDynamoDB.html";>AWS
 console</a> and edit the offsets in your DynamoDB Table.
-By default, the table-name has the following format: &ldquo;&lt;job 
name&gt;-&lt;job id&gt;-&lt;kinesis stream&gt;&rdquo;.</p>
+By default, the table-name has the following format: “&lt;job 
name&gt;-&lt;job id&gt;-&lt;kinesis stream&gt;”.</p>
 
 <h3 id="known-limitations">Known Limitations</h3>
 
 <p>The following limitations apply to Samza jobs consuming from Kinesis 
streams :</p>
 
 <ul>
-<li>Stateful processing (eg: windows or joins) is not supported on Kinesis 
streams. However, you can accomplish this by
+  <li>Stateful processing (eg: windows or joins) is not supported on Kinesis 
streams. However, you can accomplish this by
 chaining two Samza jobs where the first job reads from Kinesis and sends to 
Kafka while the second job processes the
 data from Kafka.</li>
-<li>Kinesis streams cannot be configured as <a 
href="https://samza.apache.org/learn/documentation/latest/container/streams.html";>bootstrap</a>
+  <li>Kinesis streams cannot be configured as <a 
href="https://samza.apache.org/learn/documentation/latest/container/streams.html";>bootstrap</a>
 or <a 
href="https://samza.apache.org/learn/documentation/latest/container/samza-container.html";>broadcast</a>
 streams.</li>
-<li>Kinesis streams must be used only with the <a 
href="https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java";>AllSspToSingleTaskGrouperFactory</a>
+  <li>Kinesis streams must be used only with the <a 
href="https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java";>AllSspToSingleTaskGrouperFactory</a>
 as the Kinesis consumer does the partition management by itself. No other 
grouper is currently supported.</li>
-<li>A Samza job that consumes from Kinesis cannot consume from any other input 
source. However, you can send your results
+  <li>A Samza job that consumes from Kinesis cannot consume from any other 
input source. However, you can send your results
 to any destination (eg: Kafka, EventHubs), and have another Samza job consume 
them.</li>
 </ul>
 
@@ -778,6 +788,7 @@ to any destination (eg: Kafka, EventHubs
 
 <p>The KinesisSystemProducer for Samza is not yet implemented.</p>
 
+
            
         </div>
       </div>

Modified: samza/site/learn/documentation/latest/connectors/overview.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/connectors/overview.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/connectors/overview.html (original)
+++ samza/site/learn/documentation/latest/connectors/overview.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/connectors/overview">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/connectors/overview">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/connectors/overview">1.6.0</a></li>
 
               
@@ -641,26 +655,42 @@
 
 <p>Stream processing applications often read data from external sources like 
Kafka or HDFS. Likewise, they require processed
 results to be written to external system or data stores. Samza is pluggable 
and designed to support a variety of <a 
href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemProducer.html">producers</a>
 and <a 
href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemConsumer.html">consumers</a>
 for your data. You can 
-integrate Samza with any streaming system by implementing the <a 
href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemFactory.html">SystemFactory</a>
 interface. </p>
+integrate Samza with any streaming system by implementing the <a 
href="/learn/documentation/latest/api/javadocs/org/apache/samza/system/SystemFactory.html">SystemFactory</a>
 interface.</p>
 
 <p>The following integrations are supported out-of-the-box:</p>
 
 <p>Consumers:</p>
 
 <ul>
-<li><p><a href="kafka">Apache Kafka</a> </p></li>
-<li><p><a href="eventhubs">Microsoft Azure Eventhubs</a> </p></li>
-<li><p><a href="kinesis">Amazon AWS Kinesis Streams</a> </p></li>
-<li><p><a href="hdfs">Hadoop Filesystem</a> </p></li>
+  <li>
+    <p><a href="kafka">Apache Kafka</a></p>
+  </li>
+  <li>
+    <p><a href="eventhubs">Microsoft Azure Eventhubs</a></p>
+  </li>
+  <li>
+    <p><a href="kinesis">Amazon AWS Kinesis Streams</a></p>
+  </li>
+  <li>
+    <p><a href="hdfs">Hadoop Filesystem</a></p>
+  </li>
 </ul>
 
 <p>Producers:</p>
 
 <ul>
-<li><p><a href="kafka">Apache Kafka</a> </p></li>
-<li><p><a href="eventhubs">Microsoft Azure Eventhubs</a> </p></li>
-<li><p><a href="hdfs">Hadoop Filesystem</a> </p></li>
-<li><p><a 
href="https://github.com/apache/samza/blob/master/samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java";>Elasticsearch</a></p></li>
+  <li>
+    <p><a href="kafka">Apache Kafka</a></p>
+  </li>
+  <li>
+    <p><a href="eventhubs">Microsoft Azure Eventhubs</a></p>
+  </li>
+  <li>
+    <p><a href="hdfs">Hadoop Filesystem</a></p>
+  </li>
+  <li>
+    <p><a 
href="https://github.com/apache/samza/blob/master/samza-elasticsearch/src/main/java/org/apache/samza/system/elasticsearch/ElasticsearchSystemProducer.java";>Elasticsearch</a></p>
+  </li>
 </ul>
 
            

Modified: samza/site/learn/documentation/latest/container/checkpointing.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/checkpointing.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/checkpointing.html 
(original)
+++ samza/site/learn/documentation/latest/container/checkpointing.html Wed Jan 
18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/checkpointing">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/checkpointing">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/checkpointing">1.6.0</a></li>
 
               
@@ -639,12 +653,12 @@
    limitations under the License.
 -->
 
-<p>Samza provides fault-tolerant processing of streams: Samza guarantees that 
messages won&rsquo;t be lost, even if your job crashes, if a machine dies, if 
there is a network fault, or something else goes wrong. In order to provide 
this guarantee, Samza expects the <a href="streams.html">input system</a> to 
meet the following requirements:</p>
+<p>Samza provides fault-tolerant processing of streams: Samza guarantees that 
messages won’t be lost, even if your job crashes, if a machine dies, if there 
is a network fault, or something else goes wrong. In order to provide this 
guarantee, Samza expects the <a href="streams.html">input system</a> to meet 
the following requirements:</p>
 
 <ul>
-<li>The stream may be sharded into one or more <em>partitions</em>. Each 
partition is independent from the others, and is replicated across multiple 
machines (the stream continues to be available, even if a machine fails).</li>
-<li>Each partition consists of a sequence of messages in a fixed order. Each 
message has an <em>offset</em>, which indicates its position in that sequence. 
Messages are always consumed sequentially within each partition.</li>
-<li>A Samza job can start consuming the sequence of messages from any starting 
offset.</li>
+  <li>The stream may be sharded into one or more <em>partitions</em>. Each 
partition is independent from the others, and is replicated across multiple 
machines (the stream continues to be available, even if a machine fails).</li>
+  <li>Each partition consists of a sequence of messages in a fixed order. Each 
message has an <em>offset</em>, which indicates its position in that sequence. 
Messages are always consumed sequentially within each partition.</li>
+  <li>A Samza job can start consuming the sequence of messages from any 
starting offset.</li>
 </ul>
 
 <p>Kafka meets these requirements, but they can also be implemented with other 
message broker systems.</p>
@@ -653,36 +667,36 @@
 
 <p>If a Samza container fails, it needs to be restarted (potentially on 
another machine) and resume processing where the failed container left off. In 
order to enable this, a container periodically checkpoints the current offset 
for each task instance.</p>
 
-<p><img src="/img/latest/learn/documentation/container/checkpointing.svg" 
alt="Illustration of checkpointing" class="diagram-large"></p>
+<p><img src="/img/latest/learn/documentation/container/checkpointing.svg" 
alt="Illustration of checkpointing" class="diagram-large" /></p>
 
-<p>When a Samza container starts up, it looks for the most recent checkpoint 
and starts consuming messages from the checkpointed offsets. If the previous 
container failed unexpectedly, the most recent checkpoint may be slightly 
behind the current offsets (i.e. the job may have consumed some more messages 
since the last checkpoint was written), but we can&rsquo;t know for sure. In 
that case, the job may process a few messages again.</p>
+<p>When a Samza container starts up, it looks for the most recent checkpoint 
and starts consuming messages from the checkpointed offsets. If the previous 
container failed unexpectedly, the most recent checkpoint may be slightly 
behind the current offsets (i.e. the job may have consumed some more messages 
since the last checkpoint was written), but we can’t know for sure. In that 
case, the job may process a few messages again.</p>
 
-<p>This guarantee is called <em>at-least-once processing</em>: Samza ensures 
that your job doesn&rsquo;t miss any messages, even if containers need to be 
restarted. However, it is possible for your job to see the same message more 
than once when a container is restarted. We are planning to address this in a 
future version of Samza, but for now it is just something to be aware of: for 
example, if you are counting page views, a forcefully killed container could 
cause events to be slightly over-counted. You can reduce duplication by 
checkpointing more frequently, at a slight performance cost.</p>
+<p>This guarantee is called <em>at-least-once processing</em>: Samza ensures 
that your job doesn’t miss any messages, even if containers need to be 
restarted. However, it is possible for your job to see the same message more 
than once when a container is restarted. We are planning to address this in a 
future version of Samza, but for now it is just something to be aware of: for 
example, if you are counting page views, a forcefully killed container could 
cause events to be slightly over-counted. You can reduce duplication by 
checkpointing more frequently, at a slight performance cost.</p>
 
-<p>For checkpoints to be effective, they need to be written somewhere where 
they will survive faults. Samza allows you to write checkpoints to the file 
system (using FileSystemCheckpointManager), but that doesn&rsquo;t help if the 
machine fails and the container needs to be restarted on another machine. The 
most common configuration is to use Kafka for checkpointing. You can enable 
this with the following job configuration:</p>
+<p>For checkpoints to be effective, they need to be written somewhere where 
they will survive faults. Samza allows you to write checkpoints to the file 
system (using FileSystemCheckpointManager), but that doesn’t help if the 
machine fails and the container needs to be restarted on another machine. The 
most common configuration is to use Kafka for checkpointing. You can enable 
this with the following job configuration:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c"># The name of your job 
determines the name under which checkpoints will be stored</span>
-<span class="na">job.name</span><span class="o">=</span><span 
class="s">example-job</span>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"># The name of your job determines the name under which 
checkpoints will be stored
+job.name=example-job
 
-<span class="c"># Define a system called &quot;kafka&quot; for consuming and 
producing to a Kafka cluster</span>
-<span class="na">systems.kafka.samza.factory</span><span 
class="o">=</span><span 
class="s">org.apache.samza.system.kafka.KafkaSystemFactory</span>
+# Define a system called "kafka" for consuming and producing to a Kafka cluster
+systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
 
-<span class="c"># Declare that we want our job&#39;s checkpoints to be written 
to Kafka</span>
-<span class="na">task.checkpoint.factory</span><span class="o">=</span><span 
class="s">org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory</span>
-<span class="na">task.checkpoint.system</span><span class="o">=</span><span 
class="s">kafka</span>
+# Declare that we want our job's checkpoints to be written to Kafka
+task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
+task.checkpoint.system=kafka
 
-<span class="c"># By default, a checkpoint is written every 60 seconds. You 
can change this if you like.</span>
-<span class="na">task.commit.ms</span><span class="o">=</span><span 
class="s">60000</span></code></pre></figure>
+# By default, a checkpoint is written every 60 seconds. You can change this if 
you like.
+task.commit.ms=60000</code></pre></figure>
 
 <p>In this configuration, Samza writes checkpoints to a separate Kafka topic 
called __samza_checkpoint_&lt;job-name&gt;_&lt;job-id&gt; (in the example 
configuration above, the topic would be called 
__samza_checkpoint_example-job_1). Once per minute, Samza automatically sends a 
message to this topic, in which the current offsets of the input streams are 
encoded. When a Samza container starts up, it looks for the most recent offset 
message in this topic, and loads that checkpoint.</p>
 
 <p>Sometimes it can be useful to use checkpoints only for some input streams, 
but not for others. In this case, you can tell Samza to ignore any checkpointed 
offsets for a particular stream name:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c"># Ignore any checkpoints 
for the topic &quot;my-special-topic&quot;</span>
-<span 
class="na">systems.kafka.streams.my-special-topic.samza.reset.offset</span><span
 class="o">=</span><span class="s">true</span>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"># Ignore any checkpoints for the topic 
"my-special-topic"
+systems.kafka.streams.my-special-topic.samza.reset.offset=true
 
-<span class="c"># Always start consuming &quot;my-special-topic&quot; at the 
oldest available offset</span>
-<span 
class="na">systems.kafka.streams.my-special-topic.samza.offset.default</span><span
 class="o">=</span><span class="s">oldest</span></code></pre></figure>
+# Always start consuming "my-special-topic" at the oldest available offset
+systems.kafka.streams.my-special-topic.samza.offset.default=oldest</code></pre></figure>
 
 <p>The following table explains the meaning of these configuration 
parameters:</p>
 
@@ -696,7 +710,7 @@
   </thead>
   <tbody>
     <tr>
-      <td rowspan="2" 
class="nowrap">systems.&lt;system&gt;.<br>streams.&lt;stream&gt;.<br>samza.reset.offset</td>
+      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br 
/>streams.&lt;stream&gt;.<br />samza.reset.offset</td>
       <td>false (default)</td>
       <td>When container starts up, resume processing from last checkpoint</td>
     </tr>
@@ -705,7 +719,7 @@
       <td>Ignore checkpoint (pretend that no checkpoint is present)</td>
     </tr>
     <tr>
-      <td rowspan="2" 
class="nowrap">systems.&lt;system&gt;.<br>streams.&lt;stream&gt;.<br>samza.offset.default</td>
+      <td rowspan="2" class="nowrap">systems.&lt;system&gt;.<br 
/>streams.&lt;stream&gt;.<br />samza.offset.default</td>
       <td>upcoming (default)</td>
       <td>When container starts and there is no checkpoint (or the checkpoint 
is ignored), only process messages that are published after the job is started, 
but no old messages</td>
     </tr>
@@ -720,43 +734,42 @@
 
 <h3 id="manipulating-checkpoints-manually">Manipulating Checkpoints 
Manually</h3>
 
-<p>If you want to make a one-off change to a job&rsquo;s consumer offsets, for 
example to force old messages to be <a 
href="../jobs/reprocessing.html">processed again</a> with a new version of your 
code, you can use CheckpointTool to inspect and manipulate the job&rsquo;s 
checkpoint. The tool is included in Samza&rsquo;s <a 
href="/contribute/code.html">source repository</a>.</p>
+<p>If you want to make a one-off change to a job’s consumer offsets, for 
example to force old messages to be <a 
href="../jobs/reprocessing.html">processed again</a> with a new version of your 
code, you can use CheckpointTool to inspect and manipulate the job’s 
checkpoint. The tool is included in Samza’s <a 
href="/contribute/code.html">source repository</a>.</p>
 
-<p>To inspect a job&rsquo;s latest checkpoint, you need to specify your 
job&rsquo;s config file, so that the tool knows which job it is dealing 
with:</p>
+<p>To inspect a job’s latest checkpoint, you need to specify your job’s 
config file, so that the tool knows which job it is dealing with:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
-  --config-path<span 
class="o">=</span>/path/to/job/config.properties</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
+  <span class="nt">--config-path</span><span 
class="o">=</span>/path/to/job/config.properties</code></pre></figure>
 
 <p>This command prints out the latest checkpoint in a properties file format. 
You can save the output to a file, and edit it as you wish. For example, to 
jump back to the oldest possible point in time, you can set all the offsets to 
0. Then you can feed that properties file back into checkpoint-tool.sh and save 
the modified checkpoint:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
-  --config-path<span class="o">=</span>/path/to/job/config.properties <span 
class="se">\</span>
-  --new-offsets<span 
class="o">=</span>/path/to/new/offsets.properties</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
+  <span class="nt">--config-path</span><span 
class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
+  <span class="nt">--new-offsets</span><span 
class="o">=</span>/path/to/new/offsets.properties</code></pre></figure>
 
 <p>Note that Samza only reads checkpoints on container startup. In order for 
your checkpoint change to take effect, you need to first stop the job, then 
save the modified offsets, and then start the job again. If you write a 
checkpoint while the job is running, it will most likely have no effect.</p>
 
 <h3 id="checkpoint-callbacks">Checkpoint Callbacks</h3>
-
 <p>Currently Samza takes care of checkpointing for all the systems. But there 
are some use-cases when we may need to inform the Consumer about each 
checkpoint we make.
 Here are few examples:</p>
 
 <ul>
-<li>Samza cannot do checkpointing correctly or efficiently. One such case is 
when Samza is not doing the partitioning. In this case the container doesn’t 
know which SSPs it is responsible for, and thus cannot checkpoint them. An 
actual example could be a system which relies on an auto-balanced High Level 
Kafka Consumer for partitioning.</li>
-<li>Systems in which the consumer itself needs to control the checkpointed 
offset. Some systems do not support seek() operation (are not replayable), but 
they rely on ACKs for the delivered messages. Example could be a Kinesis 
consumer. Kinesis library provides a checkpoint callback in the* process() 
*call (push system). This callback needs to be invoked after the records are 
processed. This can only be done by the consumer itself.</li>
-<li>Systems that use checkpoint/offset information for some maintenance 
actions. This information may be used to implement a smart retention policy 
(deleting all the data after it has been consumed).</li>
+  <li>Samza cannot do checkpointing correctly or efficiently. One such case is 
when Samza is not doing the partitioning. In this case the container doesn’t 
know which SSPs it is responsible for, and thus cannot checkpoint them. An 
actual example could be a system which relies on an auto-balanced High Level 
Kafka Consumer for partitioning.</li>
+  <li>Systems in which the consumer itself needs to control the checkpointed 
offset. Some systems do not support seek() operation (are not replayable), but 
they rely on ACKs for the delivered messages. Example could be a Kinesis 
consumer. Kinesis library provides a checkpoint callback in the* process() 
*call (push system). This callback needs to be invoked after the records are 
processed. This can only be done by the consumer itself.</li>
+  <li>Systems that use checkpoint/offset information for some maintenance 
actions. This information may be used to implement a smart retention policy 
(deleting all the data after it has been consumed).</li>
 </ul>
 
 <p>In order to use the checkpoint callback a SystemConsumer needs to implement 
the CheckpointListener interface:</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="kd">public</span> <span 
class="kd">interface</span> <span class="nc">CheckpointListener</span> <span 
class="o">{</span>
-  <span class="kt">void</span> <span class="nf">onCheckpoint</span><span 
class="o">(</span><span class="n">Map</span><span class="o">&lt;</span><span 
class="n">SystemStreamPartition</span><span class="o">,</span> <span 
class="n">String</span><span class="o">&gt;</span> <span 
class="n">offsets</span><span class="o">);</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="kd">public</span> <span 
class="kd">interface</span> <span class="nc">CheckpointListener</span> <span 
class="o">{</span>
+  <span class="kt">void</span> <span class="nf">onCheckpoint</span><span 
class="o">(</span><span class="nc">Map</span><span class="o">&lt;</span><span 
class="nc">SystemStreamPartition</span><span class="o">,</span> <span 
class="nc">String</span><span class="o">&gt;</span> <span 
class="n">offsets</span><span class="o">);</span>
 <span class="o">}</span></code></pre></figure>
 
-<p>For the SystemConsumers which implement this interface Samza will invoke 
onCheckpoint() callback every time OffsetManager checkpoints. Checkpoints are 
done per task, and &lsquo;offsets&rsquo; are all the offsets Samza checkpoints 
for a task,
+<p>For the SystemConsumers which implement this interface Samza will invoke 
onCheckpoint() callback every time OffsetManager checkpoints. Checkpoints are 
done per task, and ‘offsets’ are all the offsets Samza checkpoints for a 
task,
 and these are the offsets which will be passed to the consumer on restart.
 Note that the callback will happen after the checkpoint and is 
<strong>not</strong> atomic.</p>
 
-<h2 id="state-management"><a href="state-management.html">State Management 
&raquo;</a></h2>
+<h2 id="state-management-"><a href="state-management.html">State Management 
»</a></h2>
 
            
         </div>

Modified: 
samza/site/learn/documentation/latest/container/coordinator-stream.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/coordinator-stream.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/coordinator-stream.html 
(original)
+++ samza/site/learn/documentation/latest/container/coordinator-stream.html Wed 
Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/coordinator-stream">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/coordinator-stream">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/coordinator-stream">1.6.0</a></li>
 
               
@@ -638,48 +652,50 @@
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-
 <p>Samza job is completely driven by the job configuration. Thus, job 
configurations tend to be pretty large. In order to easily serialize such large 
configs and persist them between job executions, Samza writes all 
configurations to a durable stream called the <em>Coordinator Stream</em> when 
a job is submitted.</p>
 
 <p>A Coordinator Stream is single partitioned stream to which the 
configurations are written to. It shares the same characteristics as any input 
stream that can be configured in Samza - ordered, replayable and 
fault-tolerant. The stream will contain three major types of messages:</p>
 
 <ol>
-<li>Job configuration messages</li>
-<li>Task changelog partition assignment messages</li>
-<li>Container locality message</li>
+  <li>Job configuration messages</li>
+  <li>Task changelog partition assignment messages</li>
+  <li>Container locality message</li>
 </ol>
 
 <h3 id="coordinator-stream-naming">Coordinator Stream Naming</h3>
 
-<p>The naming convention is very similar to that of the checkpoint topic that 
get&rsquo;s created.</p>
-<div class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span 
class="s">&quot;__samza_coordinator_%s_%s&quot;</span> <span 
class="n">format</span> <span class="o">(</span><span 
class="n">jobName</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">&quot;_&quot;</span><span class="o">,</span> <span 
class="s">&quot;-&quot;</span><span class="o">),</span> <span 
class="n">jobId</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">&quot;_&quot;</span><span class="o">,</span> <span 
class="s">&quot;-&quot;</span><span class="o">))</span>
-</code></pre></div>
-<h3 id="coordinator-stream-message-model">Coordinator Stream Message Model</h3>
+<p>The naming convention is very similar to that of the checkpoint topic that 
get’s created.</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code><span class="s">"__samza_coordinator_%s_%s"</span> 
<span class="n">format</span> <span class="o">(</span><span 
class="n">jobName</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span 
class="o">),</span> <span class="n">jobId</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span 
class="o">))</span>
+</code></pre></div></div>
 
+<h3 id="coordinator-stream-message-model">Coordinator Stream Message Model</h3>
 <p>Coordinator stream messages are modeled as key/value pairs. The key is a 
list of well defined fields: <em>version</em>, <em>type</em>, and <em>key</em>. 
The value is a <em>map</em>. There are some pre-defined fields (such as 
timestamp, host, etc) for the value map, which are common to all messages.</p>
 
 <p>The full structure for a CoordinatorStreamMessage is:</p>
-<div class="highlight"><pre><code class="language-json" 
data-lang="json"><span></span><span class="err">key</span> <span 
class="err">=&gt;</span> <span class="p">[</span><span 
class="s2">&quot;&lt;version-number&gt;&quot;</span><span class="p">,</span> 
<span class="s2">&quot;&lt;message-type&gt;&quot;</span><span 
class="p">,</span> <span class="s2">&quot;&lt;key&gt;&quot;</span><span 
class="p">]</span>
 
-<span class="err">message</span> <span class="err">=&gt;</span> <span 
class="p">{</span>
-    <span class="nt">&quot;host&quot;</span><span class="p">:</span> <span 
class="s2">&quot;&lt;hostname&gt;&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;username&quot;</span><span class="p">:</span> <span 
class="s2">&quot;&lt;username&gt;&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;source&quot;</span><span class="p">:</span> <span 
class="s2">&quot;&lt;source-for-this-message&gt;&quot;</span><span 
class="p">,</span>
-    <span class="nt">&quot;timestamp&quot;</span><span class="p">:</span> 
<span class="err">&lt;timestamp-of-the-message&gt;</span><span 
class="p">,</span>
-    <span class="nt">&quot;values&quot;</span><span class="p">:</span> <span 
class="p">{</span> <span class="p">}</span>
-<span class="p">}</span>
-</code></pre></div>
+<div class="language-json highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code><span class="err">key</span><span class="w"> 
</span><span class="err">=&gt;</span><span class="w"> </span><span 
class="p">[</span><span class="s2">"&lt;version-number&gt;"</span><span 
class="p">,</span><span class="w"> </span><span 
class="s2">"&lt;message-type&gt;"</span><span class="p">,</span><span 
class="w"> </span><span class="s2">"&lt;key&gt;"</span><span 
class="p">]</span><span class="w">
+
+</span><span class="err">message</span><span class="w"> </span><span 
class="err">=&gt;</span><span class="w"> </span><span class="p">{</span><span 
class="w">
+    </span><span class="nl">"host"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"&lt;hostname&gt;"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"username"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"&lt;username&gt;"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"source"</span><span class="p">:</span><span 
class="w"> </span><span 
class="s2">"&lt;source-for-this-message&gt;"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"timestamp"</span><span class="p">:</span><span 
class="w"> </span><span 
class="err">&lt;timestamp-of-the-message&gt;</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"values"</span><span class="p">:</span><span 
class="w"> </span><span class="p">{</span><span class="w"> </span><span 
class="p">}</span><span class="w">
+</span><span class="p">}</span><span class="w">
+</span></code></pre></div></div>
+
 <p>The messages are essentially serialized and transmitted over the wire as 
JSON blobs. Hence, for serialization to work correctly, it is very important to 
not have any unnecessary white spaces. The white spaces in the above JSON blob 
have been shown for legibility only.</p>
 
 <p>The most important fields are type, key, and values:</p>
 
 <ul>
-<li>type - defines the kind of message</li>
-<li>key - defines a key to associate with the values</li>
-<li>values map - defined on a per-message-type basis, and defines a set of 
values associated with the type</li>
+  <li>type - defines the kind of message</li>
+  <li>key - defines a key to associate with the values</li>
+  <li>values map - defined on a per-message-type basis, and defines a set of 
values associated with the type</li>
 </ul>
 
-<p>The coordinator stream messages that are currently supported are listed 
below:
+<p>The coordinator stream messages that are currently supported are listed 
below:</p>
 <style>
             table th, table td {
                 text-align: left;
@@ -689,16 +705,17 @@
                 border-top: 1px solid #ccc;
                 border-left: 0;
                 border-right: 0;
-            }</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>        table td.property, table td.default {
-            white-space: nowrap;
-        }
-
-        table th {
-            background-color: #eee;
-        }
-</code></pre></div>
-<p></style>
+            }
+
+            table td.property, table td.default {
+                white-space: nowrap;
+            }
+
+            table th {
+                background-color: #eee;
+            }
+</style>
+
 <table>
     <tr>
         <th>Message</th>
@@ -709,36 +726,35 @@
     <tr>
         <td> Configuration Message <br />
             (Applies to all configuration <br />
-             options listed in <a 
href="../jobs/configuration-table.html">Configuration</a>) </td>
+             options listed in 
[Configuration](../jobs/configuration-table.html)) </td>
         <td> set-config </td>
         <td> &lt;config-name&gt; </td>
-        <td> &lsquo;value&rsquo; =&gt; &lt;config-value&gt; </td>
+        <td> 'value' =&gt; &lt;config-value&gt; </td>
     </tr>
     <tr>
         <td> Task-ChangelogPartition Assignment Message </td>
         <td> set-changelog </td>
-        <td> &lt;<a 
href="../api/org/apache/samza/container/TaskName.java">TaskName</a>&gt; </td>
-        <td> &lsquo;partition&rsquo; =&gt; &lt;Changelog-Partition-Id&gt;
+        <td> 
&lt;[TaskName](../api/org/apache/samza/container/TaskName.java)&gt; </td>
+        <td> 'partition' =&gt; &lt;Changelog-Partition-Id&gt;
         </td>
     </tr>
     <tr>
         <td> Container Locality Message </td>
         <td> set-container-host-assignment </td>
         <td> &lt;Container-Id&gt; </td>
-        <td> &lsquo;hostname&rsquo; =&gt; &lt;HostName&gt;
+        <td> 'hostname' =&gt; &lt;HostName&gt;
         </td>
     </tr>
-</table></p>
+</table>
 
 <h3 id="coordinator-stream-writer">Coordinator Stream Writer</h3>
-
 <p>Samza provides a command line tool to write Job Configuration messages to 
the coordinator stream. The tool can be used as follows:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>samza-example/target/bin/run-coordinator-stream-writer.sh
 <span class="se">\</span>
-  --config-path<span class="o">=</span>/path/to/job/config.properties <span 
class="se">\</span>
-  --type set-config <span class="se">\</span>
-  --key job.container.count <span class="se">\</span>
-  --value <span class="m">8</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/run-coordinator-stream-writer.sh 
<span class="se">\</span>
+  <span class="nt">--config-path</span><span 
class="o">=</span>/path/to/job/config.properties <span class="se">\</span>
+  <span class="nt">--type</span> set-config <span class="se">\</span>
+  <span class="nt">--key</span> job.container.count <span class="se">\</span>
+  <span class="nt">--value</span> 8</code></pre></figure>
 
 <h2 id="job-coordinator"><a name="JobCoordinator"></a>Job Coordinator</h2>
 
@@ -746,7 +762,7 @@
 
 <p>Job Model is the data model used to represent a Samza job, which also 
incorporates the Job configuration. The hierarchy of a Samza job - job has 
containers, and each of the containers has tasks - is encapsulated in the Job 
Model, along with relevant information such as container id, task names, 
partition information, etc.</p>
 
-<p>The Job Coordinator exposes the Job Model and Job Configuration via an HTTP 
service. The URL for the Job Coordinator&rsquo;s HTTP service is passed as an 
environment variable to the Samza Containers when the containers are launched. 
Containers may write meta-information, such as locality - the hostname of the 
machine on which the container is running. However, they will read the Job 
Model and Configuration by querying the Job Coordinator via the HTTP 
service.</p>
+<p>The Job Coordinator exposes the Job Model and Job Configuration via an HTTP 
service. The URL for the Job Coordinator’s HTTP service is passed as an 
environment variable to the Samza Containers when the containers are launched. 
Containers may write meta-information, such as locality - the hostname of the 
machine on which the container is running. However, they will read the Job 
Model and Configuration by querying the Job Coordinator via the HTTP 
service.</p>
 
 <p>Thus, Job Coorindator is the single component that has the latest view of 
the entire job status. This is very useful as it allows us to extend 
functionality of the Job Coordinator, in the future, to manage the lifecycle of 
the job (such as start/stop container, modify task assignment etc).</p>
 
@@ -755,21 +771,20 @@
 <p>The Job Coordinator resides in the same container as the Samza Application 
Master. Thus, the availability of the Job Coordinator is tied to the 
availability of the Application Master (AM) in the Yarn cluster. The Samza 
containers are started only after initializing the Job Coordinator from the 
Coordinator Stream. In stable condition, when the Samza container comes up, it 
should be able to read the JobModel from the Job Coordinator without timing 
out.</p>
 
 <h2 id="benefits-of-coordinator-stream-model">Benefits of Coordinator Stream 
Model</h2>
-
 <p>Writing the configuration to a durable stream opens the door for Samza to 
do a couple of things:</p>
 
 <ol>
-<li>Removes the size-bound on the Job configuration</li>
-<li>Exposes job-related configuration and metadata to the containers using a 
standard data model and communication interface (See <a 
href="#JobCoordinator">Job Coordinator</a> for details)</li>
-<li>Certain configurations should only be set one time. Changing them in 
future deployment amounts to resetting the entire state of the job because it 
may re-shuffle input partitions to the containers. For example, changing <a 
href="../api/javadocs/org/apache/samza/container/grouper/stream/SystemStreamPartitionGrouper.java">SystemStreamPartitionGrouper</a>
 on a stateful Samza job would inter-mingle state from different StreamTasks in 
a single changelog partition. Without persistent configuration, there is no 
easy way to check whether a job&rsquo;s current configuration is valid or 
not.</li>
-<li>Job configuration can be dynamically changed by writing to the Coorinator 
Stream. This can enable features that require the job to be reactive to 
configuration change (eg. host-affinity, auto-scaling, dynamic reconfiguration 
etc).</li>
-<li>Provides a unified view of the job state, enabling Samza with more 
powerful ways of controlling container controls (See <a 
href="#JobCoordinator">Job Coordinator</a> for details)</li>
-<li>Enables future design of Job Coordinator fail-over since it serves as a 
single source of truth of the current job state</li>
+  <li>Removes the size-bound on the Job configuration</li>
+  <li>Exposes job-related configuration and metadata to the containers using a 
standard data model and communication interface (See <a 
href="#JobCoordinator">Job Coordinator</a> for details)</li>
+  <li>Certain configurations should only be set one time. Changing them in 
future deployment amounts to resetting the entire state of the job because it 
may re-shuffle input partitions to the containers. For example, changing <a 
href="../api/javadocs/org/apache/samza/container/grouper/stream/SystemStreamPartitionGrouper.java">SystemStreamPartitionGrouper</a>
 on a stateful Samza job would inter-mingle state from different StreamTasks in 
a single changelog partition. Without persistent configuration, there is no 
easy way to check whether a job’s current configuration is valid or not.</li>
+  <li>Job configuration can be dynamically changed by writing to the 
Coorinator Stream. This can enable features that require the job to be reactive 
to configuration change (eg. host-affinity, auto-scaling, dynamic 
reconfiguration etc).</li>
+  <li>Provides a unified view of the job state, enabling Samza with more 
powerful ways of controlling container controls (See <a 
href="#JobCoordinator">Job Coordinator</a> for details)</li>
+  <li>Enables future design of Job Coordinator fail-over since it serves as a 
single source of truth of the current job state</li>
 </ol>
 
 <p>For other interesting features that can leverage this model, please refer 
the <a 
href="https://issues.apache.org/jira/secure/attachment/12670650/DESIGN-SAMZA-348-1.pdf";>design
 document</a>.</p>
 
-<h2 id="event-loop"><a href="event-loop.html">Event Loop &raquo;</a></h2>
+<h2 id="event-loop-"><a href="event-loop.html">Event Loop »</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/event-loop.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/event-loop.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/event-loop.html (original)
+++ samza/site/learn/documentation/latest/container/event-loop.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/event-loop">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/event-loop">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/event-loop">1.6.0</a></li>
 
               
@@ -639,13 +653,13 @@
    limitations under the License.
 -->
 
-<p>The event loop orchestrates <a href="streams.html">reading and processing 
messages</a>, <a href="checkpointing.html">checkpointing</a>, <a 
href="windowing.html">windowing</a> and <a href="metrics.html">flushing 
metrics</a> among tasks. </p>
+<p>The event loop orchestrates <a href="streams.html">reading and processing 
messages</a>, <a href="checkpointing.html">checkpointing</a>, <a 
href="windowing.html">windowing</a> and <a href="metrics.html">flushing 
metrics</a> among tasks.</p>
 
 <p>By default Samza uses a single thread in each <a 
href="samza-container.html">container</a> to run the tasks. This fits CPU-bound 
jobs well; to get more CPU processors, simply add more containers. The single 
thread execution also simplifies sharing task state and resource management.</p>
 
 <p>For IO-bound jobs, Samza supports finer-grained parallelism for both 
synchronous and asynchronous tasks. For synchronous tasks (<a 
href="../api/javadocs/org/apache/samza/task/StreamTask.html">StreamTask</a> and 
<a 
href="../api/javadocs/org/apache/samza/task/WindowableTask.html">WindowableTask</a>),
 you can schedule them to run in parallel by configuring the build-in thread 
pool <a 
href="../jobs/configuration-table.html">job.container.thread.pool.size</a>. 
This fits the blocking-IO task scenario. For asynchronous tasks (<a 
href="../api/javadocs/org/apache/samza/task/AsyncStreamTask.html">AsyncStreamTask</a>),
 you can make async IO calls and trigger callbacks upon completion. The finest 
degree of parallelism Samza provides is within a task, and is configured by <a 
href="../jobs/configuration-table.html">task.max.concurrency</a>.</p>
 
-<p>The latest version of Samza is thread-safe. You can safely access your 
job’s state in <a href="state-management.html">key-value store</a>, write 
messages and checkpoint offset in the task threads. If you have other data 
shared among tasks, such as global variables or static data, it is not thread 
safe if the data can be accessed concurrently by multiple threads, e.g. 
StreamTask running in the configured thread pool with more than one threads. 
For states within a task, such as member variables, Samza guarantees the mutual 
exclusiveness of process, window and commit so there will be no concurrent 
modifications among these operations and any state change from one operation 
will be fully visible to the others.     </p>
+<p>The latest version of Samza is thread-safe. You can safely access your 
job’s state in <a href="state-management.html">key-value store</a>, write 
messages and checkpoint offset in the task threads. If you have other data 
shared among tasks, such as global variables or static data, it is not thread 
safe if the data can be accessed concurrently by multiple threads, e.g. 
StreamTask running in the configured thread pool with more than one threads. 
For states within a task, such as member variables, Samza guarantees the mutual 
exclusiveness of process, window and commit so there will be no concurrent 
modifications among these operations and any state change from one operation 
will be fully visible to the others.</p>
 
 <h3 id="event-loop-internals">Event Loop Internals</h3>
 
@@ -654,23 +668,23 @@
 <p>The event loop works as follows:</p>
 
 <ol>
-<li>Choose a message from the incoming message queue;</li>
-<li>Schedule the appropriate <a href="samza-container.html">task instance</a> 
to process the message;</li>
-<li>Schedule window() on the task instance to run if it implements 
WindowableTask, and the window timer has been triggered;</li>
-<li>Send any output from the process() and window() calls to the appropriate 
<a 
href="../api/javadocs/org/apache/samza/system/SystemProducer.html">SystemProducers</a>;</li>
-<li>Write checkpoints and flush the state stores for any tasks whose <a 
href="checkpointing.html">commit interval</a> has elapsed.</li>
-<li>Block if all task instances are busy with processing outstanding messages, 
windowing or checkpointing.</li>
+  <li>Choose a message from the incoming message queue;</li>
+  <li>Schedule the appropriate <a href="samza-container.html">task 
instance</a> to process the message;</li>
+  <li>Schedule window() on the task instance to run if it implements 
WindowableTask, and the window timer has been triggered;</li>
+  <li>Send any output from the process() and window() calls to the appropriate 
<a 
href="../api/javadocs/org/apache/samza/system/SystemProducer.html">SystemProducers</a>;</li>
+  <li>Write checkpoints and flush the state stores for any tasks whose <a 
href="checkpointing.html">commit interval</a> has elapsed.</li>
+  <li>Block if all task instances are busy with processing outstanding 
messages, windowing or checkpointing.</li>
 </ol>
 
 <p>The container does this, in a loop, until it is shut down.</p>
 
-<h3 id="semantics-for-synchronous-tasks-v-s-asynchronous-tasks">Semantics for 
Synchronous Tasks v.s. Asynchronous Tasks</h3>
+<h3 id="semantics-for-synchronous-tasks-vs-asynchronous-tasks">Semantics for 
Synchronous Tasks v.s. Asynchronous Tasks</h3>
 
 <p>The semantics of the event loop differs when running synchronous tasks and 
asynchronous tasks:</p>
 
 <ul>
-<li>For synchronous tasks (StreamTask and WindowableTask), process() and 
window() will run on the single main thread by default. You can configure 
job.container.thread.pool.size to be greater than 1, and event loop will 
schedule the process() and window() to run in the thread pool.<br></li>
-<li>For Asynchronous tasks (AsyncStreamTask), processAsync() will always be 
invoked in a single thread, while callbacks can be triggered from a different 
user thread. </li>
+  <li>For synchronous tasks (StreamTask and WindowableTask), process() and 
window() will run on the single main thread by default. You can configure 
job.container.thread.pool.size to be greater than 1, and event loop will 
schedule the process() and window() to run in the thread pool.</li>
+  <li>For Asynchronous tasks (AsyncStreamTask), processAsync() will always be 
invoked in a single thread, while callbacks can be triggered from a different 
user thread.</li>
 </ul>
 
 <p>In both cases, the default concurrency within a task is 1, meaning at most 
one outstanding message in processing per task. This guarantees in-order 
message processing in a topic partition. You can further increase it by 
configuring task.max.concurrency to be greater than 1. This allows multiple 
outstanding messages to be processed in parallel by a task. This option 
increases the parallelism within a task, but may result in out-of-order 
processing and completion.</p>
@@ -678,20 +692,20 @@
 <p>The following semantics are guaranteed in any of the above cases (for 
happens-before semantics, see <a 
href="https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html";>here</a>):</p>
 
 <ul>
-<li>If task.max.concurrency = 1, each message process completion in a task is 
guaranteed to happen-before the next invocation of process()/processAsync() of 
the same task. If task.max.concurrency &gt; 1, there is no such happens-before 
constraint and user should synchronize access to any shared/global variables in 
the Task..</li>
-<li>WindowableTask.window() is called when no invocations to 
process()/processAsync() are pending and no new process()/processAsync() 
invocations can be scheduled until it completes. Therefore, a guarantee that 
all previous process()/processAsync() invocations happen before an invocation 
of WindowableTask.window(). An invocation to WindowableTask.window() is 
guaranteed to happen-before any subsequent process()/processAsync() 
invocations. The Samza engine is responsible for ensuring that window is 
invoked in a timely manner.</li>
-<li>Checkpointing is guaranteed to only cover events that are fully processed. 
It happens only when there are no pending process()/processAsync() or 
WindowableTask.window() invocations. All preceding invocations happen-before 
checkpointing and checkpointing happens-before all subsequent invocations.</li>
+  <li>If task.max.concurrency = 1, each message process completion in a task 
is guaranteed to happen-before the next invocation of process()/processAsync() 
of the same task. If task.max.concurrency &gt; 1, there is no such 
happens-before constraint and user should synchronize access to any 
shared/global variables in the Task..</li>
+  <li>WindowableTask.window() is called when no invocations to 
process()/processAsync() are pending and no new process()/processAsync() 
invocations can be scheduled until it completes. Therefore, a guarantee that 
all previous process()/processAsync() invocations happen before an invocation 
of WindowableTask.window(). An invocation to WindowableTask.window() is 
guaranteed to happen-before any subsequent process()/processAsync() 
invocations. The Samza engine is responsible for ensuring that window is 
invoked in a timely manner.</li>
+  <li>Checkpointing is guaranteed to only cover events that are fully 
processed. It happens only when there are no pending process()/processAsync() 
or WindowableTask.window() invocations. All preceding invocations happen-before 
checkpointing and checkpointing happens-before all subsequent invocations.</li>
 </ul>
 
 <p>More details and examples can be found in <a 
href="../../../tutorials/latest/samza-async-user-guide.html">Samza Async API 
and Multithreading User Guide</a>.</p>
 
 <h3 id="lifecycle">Lifecycle</h3>
 
-<p>The only way in which a developer can hook into a SamzaContainer&rsquo;s 
lifecycle is through the standard InitableTask, ClosableTask, 
StreamTask/AsyncStreamTask, and WindowableTask. In cases where pluggable logic 
needs to be added to wrap a StreamTask, the StreamTask can be wrapped by 
another StreamTask implementation that handles the custom logic before calling 
into the wrapped StreamTask.</p>
+<p>The only way in which a developer can hook into a SamzaContainer’s 
lifecycle is through the standard InitableTask, ClosableTask, 
StreamTask/AsyncStreamTask, and WindowableTask. In cases where pluggable logic 
needs to be added to wrap a StreamTask, the StreamTask can be wrapped by 
another StreamTask implementation that handles the custom logic before calling 
into the wrapped StreamTask.</p>
 
 <p>A concrete example is a set of StreamTasks that all want to share the same 
try/catch logic in their process() method. A StreamTask can be implemented that 
wraps the original StreamTasks, and surrounds the original process() call with 
the appropriate try/catch logic. For more details, see <a 
href="https://issues.apache.org/jira/browse/SAMZA-437";>this discussion</a>.</p>
 
-<h2 id="metrics"><a href="metrics.html">Metrics &raquo;</a></h2>
+<h2 id="metrics-"><a href="metrics.html">Metrics »</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/jmx.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/jmx.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/jmx.html (original)
+++ samza/site/learn/documentation/latest/container/jmx.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/jmx">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/jmx">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/jmx">1.6.0</a></li>
 
               
@@ -639,22 +653,24 @@
    limitations under the License.
 -->
 
-<p>Samza&rsquo;s containers and YARN ApplicationMaster enable <a 
href="http://docs.oracle.com/javase/tutorial/jmx/";>JMX</a> by default. JMX can 
be used for managing the JVM; for example, you can connect to it using <a 
href="http://docs.oracle.com/javase/7/docs/technotes/guides/management/jconsole.html";>jconsole</a>,
 which is included in the JDK.</p>
+<p>Samza’s containers and YARN ApplicationMaster enable <a 
href="http://docs.oracle.com/javase/tutorial/jmx/";>JMX</a> by default. JMX can 
be used for managing the JVM; for example, you can connect to it using <a 
href="http://docs.oracle.com/javase/7/docs/technotes/guides/management/jconsole.html";>jconsole</a>,
 which is included in the JDK.</p>
 
 <p>You can tell Samza to publish its internal <a 
href="metrics.html">metrics</a>, and any custom metrics you define, as JMX 
MBeans. To enable this, set the following properties in your job 
configuration:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c"># Define a Samza metrics 
reporter called &quot;jmx&quot;, which publishes to JMX</span>
-<span class="na">metrics.reporter.jmx.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.metrics.reporter.JmxReporterFactory</span>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"># Define a Samza metrics reporter called "jmx", which 
publishes to JMX
+metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
+
+# Use it (if you have multiple reporters defined, separate them with commas)
+metrics.reporters=jmx</code></pre></figure>
 
-<span class="c"># Use it (if you have multiple reporters defined, separate 
them with commas)</span>
-<span class="na">metrics.reporters</span><span class="o">=</span><span 
class="s">jmx</span></code></pre></figure>
+<p>JMX needs to be configured to use a specific port, but in a distributed 
environment, there is no way of knowing in advance which ports are available on 
the machines running your containers. Therefore Samza chooses the JMX port 
randomly. If you need to connect to it, you can find the port by looking in the 
container’s logs, which report the JMX server details as follows:</p>
 
-<p>JMX needs to be configured to use a specific port, but in a distributed 
environment, there is no way of knowing in advance which ports are available on 
the machines running your containers. Therefore Samza chooses the JMX port 
randomly. If you need to connect to it, you can find the port by looking in the 
container&rsquo;s logs, which report the JMX server details as follows:</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>2014-06-02 21:50:17 JmxServer [INFO] According to 
InetAddress.getLocalHost.getHostName we are samza-grid-1234.example.com
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>2014-06-02 21:50:17 JmxServer [INFO] According to 
InetAddress.getLocalHost.getHostName we are samza-grid-1234.example.com
 2014-06-02 21:50:17 JmxServer [INFO] Started JmxServer registry port=50214 
server port=50215 
url=service:jmx:rmi://localhost:50215/jndi/rmi://localhost:50214/jmxrmi
 2014-06-02 21:50:17 JmxServer [INFO] If you are tunneling, you might want to 
try JmxServer registry port=50214 server port=50215 
url=service:jmx:rmi://samza-grid-1234.example.com:50215/jndi/rmi://samza-grid-1234.example.com:50214/jmxrmi
-</code></pre></div>
-<h2 id="jobrunner"><a href="../jobs/job-runner.html">JobRunner &raquo;</a></h2>
+</code></pre></div></div>
+
+<h2 id="jobrunner-"><a href="../jobs/job-runner.html">JobRunner »</a></h2>
 
            
         </div>


Reply via email to