documen...

ajothomas Wed, 18 Jan 2023 11:34:11 -0800

Modified: samza/site/learn/documentation/latest/container/metrics-table.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/metrics-table.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/metrics-table.html 
(original)
+++ samza/site/learn/documentation/latest/container/metrics-table.html Wed Jan 
18 19:33:25 2023
@@ -216,6 +216,10 @@
         <td>Current work factor in use</td>
     </tr>
     <tr>
+        <td>total-process-cpu-usage</td>
+        <td>The process cpu usage percentage (in the [0, 100] interval) used 
by the Samza container process and all its child processes</td>
+    </tr>
+    <tr>
         <td>physical-memory-mb</td>
         <td>The physical memory used by the Samza container process (native + 
on heap) (in megabytes)</td>
     </tr>
@@ -356,6 +360,10 @@
         <td>Current CPU usage of the JVM process as a percentage from 0 to 
100. The percentage represents the proportion of executed ticks by the JVM 
process to the total ticks across all CPUs. A negative number indicates the 
value was not available from the operating system. For more detail, see the 
JavaDoc for com.sun.management.OperatingSystemMXBean.</td>
     </tr>
     <tr>
+        <td>process-cpu-usage-processors</td>
+        <td>Number of processors currently in use by the JVM process, 
calculated by multiplying the usage percentage by the total number of 
processors. A negative number indicates that there was not enough information 
available to calculate this value. For more detail, see the JavaDoc for 
com.sun.management.OperatingSystemMXBean.</td>
+    </tr>
+    <tr>
         <td>system-cpu-usage</td>
         <td>Current CPU usage of the all processes in the whole system as a 
percentage from 0 to 100. The percentage represents the proportion of executed 
ticks by all processes to the total ticks across all CPUs. A negative number 
indicates the value was not available from the operating system. For more 
detail, see the JavaDoc for com.sun.management.OperatingSystemMXBean.</td>
     </tr>
@@ -984,7 +992,7 @@
         <td><a href="#average-time">Average time</a> taken for all the 
processors to get the latest version of the job model after single processor 
change (without the occurence of a barrier timeout)</td>
     </tr>
     <tr>
-        <th colspan="2" class="section" 
id="job-coordinator-metadata-manager-metrics">org.apache.samza.coordinator.JobCoordinatorMetadataManager.JobCoordinatorMetadataManagerMetrics<br><span
 style="font-weight: normal;margin-left:40px;"><b>Note</b>: The following 
metrics are applicable when Application Master High Availability is 
enabled</span></th>
+        <th colspan="2" class="section" 
id="job-coordinator-metadata-manager-metrics">org.apache.samza.job.metadata.JobCoordinatorMetadataManager.JobCoordinatorMetadataManagerMetrics<br><span
 style="font-weight: normal;margin-left:40px;"><b>Note</b>: The following 
metrics are applicable when Application Master High Availability is 
enabled</span></th>
     </tr>
     <tr>
         <td>application-attempt-count</td>


Modified: samza/site/learn/documentation/latest/container/metrics.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/metrics.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/metrics.html (original)
+++ samza/site/learn/documentation/latest/container/metrics.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/metrics">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/metrics">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/metrics">1.6.0</a></li>
 
               
@@ -639,72 +653,72 @@
    limitations under the License.
 -->
 
-<p>When you&rsquo;re running a stream process in production, it&rsquo;s 
important that you have good metrics to track the health of your job. In order 
to make this easy, Samza includes a metrics library. It is used by Samza itself 
to generate some standard metrics such as message throughput, but you can also 
use it in your task code to emit custom metrics.</p>
+<p>When youâre running a stream process in production, itâs important that 
you have good metrics to track the health of your job. In order to make this 
easy, Samza includes a metrics library. It is used by Samza itself to generate 
some standard metrics such as message throughput, but you can also use it in 
your task code to emit custom metrics.</p>
 
-<p>Metrics can be reported in various ways. You can expose them via <a 
href="jmx.html">JMX</a>, which is useful in development. In production, a 
common setup is for each Samza container to periodically publish its metrics to 
a &ldquo;metrics&rdquo; Kafka topic, in which the metrics from all Samza jobs 
are aggregated. You can then consume this stream in another Samza job, and send 
the metrics to your favorite graphing system such as <a 
href="http://graphite.wikidot.com/";>Graphite</a>.</p>
+<p>Metrics can be reported in various ways. You can expose them via <a 
href="jmx.html">JMX</a>, which is useful in development. In production, a 
common setup is for each Samza container to periodically publish its metrics to 
a âmetricsâ Kafka topic, in which the metrics from all Samza jobs are 
aggregated. You can then consume this stream in another Samza job, and send the 
metrics to your favorite graphing system such as <a 
href="http://graphite.wikidot.com/";>Graphite</a>.</p>
 
 <p>To set up your job to publish metrics to Kafka, you can use the following 
configuration:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c"># Define a metrics 
reporter called &quot;snapshot&quot;, which publishes metrics</span>
-<span class="c"># every 60 seconds.</span>
-<span class="na">metrics.reporters</span><span class="o">=</span><span 
class="s">snapshot</span>
-<span class="na">metrics.reporter.snapshot.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory</span>
-
-<span class="c"># Tell the snapshot reporter to publish to a topic called 
&quot;metrics&quot;</span>
-<span class="c"># in the &quot;kafka&quot; system.</span>
-<span class="na">metrics.reporter.snapshot.stream</span><span 
class="o">=</span><span class="s">kafka.metrics</span>
-
-<span class="c"># Encode metrics data as JSON.</span>
-<span class="na">serializers.registry.metrics.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.serializers.MetricsSnapshotSerdeFactory</span>
-<span class="na">systems.kafka.streams.metrics.samza.msg.serde</span><span 
class="o">=</span><span class="s">metrics</span></code></pre></figure>
-
-<p>With this configuration, the job automatically sends several JSON-encoded 
messages to the &ldquo;metrics&rdquo; topic in Kafka every 60 seconds. The 
messages look something like this:</p>
-
-<figure class="highlight"><pre><code class="language-json" 
data-lang="json"><span></span><span class="p">{</span>
-  <span class="nt">&quot;header&quot;</span><span class="p">:</span> <span 
class="p">{</span>
-    <span class="nt">&quot;container-name&quot;</span><span class="p">:</span> 
<span class="s2">&quot;samza-container-0&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;host&quot;</span><span class="p">:</span> <span 
class="s2">&quot;samza-grid-1234.example.com&quot;</span><span 
class="p">,</span>
-    <span class="nt">&quot;job-id&quot;</span><span class="p">:</span> <span 
class="s2">&quot;1&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;job-name&quot;</span><span class="p">:</span> <span 
class="s2">&quot;my-samza-job&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;reset-time&quot;</span><span class="p">:</span> 
<span class="mi">1401729000347</span><span class="p">,</span>
-    <span class="nt">&quot;samza-version&quot;</span><span class="p">:</span> 
<span class="s2">&quot;0.0.1&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;source&quot;</span><span class="p">:</span> <span 
class="s2">&quot;Partition-2&quot;</span><span class="p">,</span>
-    <span class="nt">&quot;time&quot;</span><span class="p">:</span> <span 
class="mi">1401729420566</span><span class="p">,</span>
-    <span class="nt">&quot;version&quot;</span><span class="p">:</span> <span 
class="s2">&quot;0.0.1&quot;</span>
-  <span class="p">},</span>
-  <span class="nt">&quot;metrics&quot;</span><span class="p">:</span> <span 
class="p">{</span>
-    <span 
class="nt">&quot;org.apache.samza.container.TaskInstanceMetrics&quot;</span><span
 class="p">:</span> <span class="p">{</span>
-      <span class="nt">&quot;commit-calls&quot;</span><span class="p">:</span> 
<span class="mi">7</span><span class="p">,</span>
-      <span class="nt">&quot;commit-skipped&quot;</span><span 
class="p">:</span> <span class="mi">77948</span><span class="p">,</span>
-      <span class="nt">&quot;kafka-input-topic-offset&quot;</span><span 
class="p">:</span> <span class="s2">&quot;1606&quot;</span><span 
class="p">,</span>
-      <span class="nt">&quot;messages-sent&quot;</span><span 
class="p">:</span> <span class="mi">985</span><span class="p">,</span>
-      <span class="nt">&quot;process-calls&quot;</span><span 
class="p">:</span> <span class="mi">1093</span><span class="p">,</span>
-      <span class="nt">&quot;send-calls&quot;</span><span class="p">:</span> 
<span class="mi">985</span><span class="p">,</span>
-      <span class="nt">&quot;send-skipped&quot;</span><span class="p">:</span> 
<span class="mi">76970</span><span class="p">,</span>
-      <span class="nt">&quot;window-calls&quot;</span><span class="p">:</span> 
<span class="mi">0</span><span class="p">,</span>
-      <span class="nt">&quot;window-skipped&quot;</span><span 
class="p">:</span> <span class="mi">77955</span>
-    <span class="p">}</span>
-  <span class="p">}</span>
-<span class="p">}</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"># Define a metrics reporter called "snapshot", which 
publishes metrics
+# every 60 seconds.
+metrics.reporters=snapshot
+metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
+
+# Tell the snapshot reporter to publish to a topic called "metrics"
+# in the "kafka" system.
+metrics.reporter.snapshot.stream=kafka.metrics
+
+# Encode metrics data as JSON.
+serializers.registry.metrics.class=org.apache.samza.serializers.MetricsSnapshotSerdeFactory
+systems.kafka.streams.metrics.samza.msg.serde=metrics</code></pre></figure>
+
+<p>With this configuration, the job automatically sends several JSON-encoded 
messages to the âmetricsâ topic in Kafka every 60 seconds. The messages 
look something like this:</p>
+
+<figure class="highlight"><pre><code class="language-json" 
data-lang="json"><span class="p">{</span><span class="w">
+  </span><span class="nl">"header"</span><span class="p">:</span><span 
class="w"> </span><span class="p">{</span><span class="w">
+    </span><span class="nl">"container-name"</span><span 
class="p">:</span><span class="w"> </span><span 
class="s2">"samza-container-0"</span><span class="p">,</span><span class="w">
+    </span><span class="nl">"host"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"samza-grid-1234.example.com"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"job-id"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"1"</span><span class="p">,</span><span 
class="w">
+    </span><span class="nl">"job-name"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"my-samza-job"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"reset-time"</span><span class="p">:</span><span 
class="w"> </span><span class="mi">1401729000347</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"samza-version"</span><span 
class="p">:</span><span class="w"> </span><span class="s2">"0.0.1"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"source"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"Partition-2"</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"time"</span><span class="p">:</span><span 
class="w"> </span><span class="mi">1401729420566</span><span 
class="p">,</span><span class="w">
+    </span><span class="nl">"version"</span><span class="p">:</span><span 
class="w"> </span><span class="s2">"0.0.1"</span><span class="w">
+  </span><span class="p">},</span><span class="w">
+  </span><span class="nl">"metrics"</span><span class="p">:</span><span 
class="w"> </span><span class="p">{</span><span class="w">
+    </span><span 
class="nl">"org.apache.samza.container.TaskInstanceMetrics"</span><span 
class="p">:</span><span class="w"> </span><span class="p">{</span><span 
class="w">
+      </span><span class="nl">"commit-calls"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">7</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"commit-skipped"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">77948</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"kafka-input-topic-offset"</span><span 
class="p">:</span><span class="w"> </span><span class="s2">"1606"</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"messages-sent"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">985</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"process-calls"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">1093</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"send-calls"</span><span class="p">:</span><span 
class="w"> </span><span class="mi">985</span><span class="p">,</span><span 
class="w">
+      </span><span class="nl">"send-skipped"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">76970</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"window-calls"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">0</span><span 
class="p">,</span><span class="w">
+      </span><span class="nl">"window-skipped"</span><span 
class="p">:</span><span class="w"> </span><span class="mi">77955</span><span 
class="w">
+    </span><span class="p">}</span><span class="w">
+  </span><span class="p">}</span><span class="w">
+</span><span class="p">}</span></code></pre></figure>
 
 <p>There is a separate message for each task instance, and the header tells 
you the job name, job ID and partition of the task. The metrics allow you to 
see how many messages have been processed and sent, the current offset in the 
input stream partition, and other details. There are additional messages which 
give you metrics about the JVM (heap size, garbage collection information, 
threads etc.), internal metrics of the Kafka producers and consumers, and more. 
The list of all metrics emitted by samza is shown <a 
href="metrics-table.html">here</a>.</p>
 
-<p>It&rsquo;s easy to generate custom metrics in your job, if there&rsquo;s 
some value you want to keep an eye on. You can use Samza&rsquo;s built-in 
metrics framework, which is similar in design to Coda Hale&rsquo;s <a 
href="http://metrics.dropwizard.io/";>metrics</a> library.</p>
+<p>Itâs easy to generate custom metrics in your job, if thereâs some value 
you want to keep an eye on. You can use Samzaâs built-in metrics framework, 
which is similar in design to Coda Haleâs <a 
href="http://metrics.dropwizard.io/";>metrics</a> library.</p>
 
 <p>You can register your custom metrics through a <a 
href="../api/javadocs/org/apache/samza/metrics/MetricsRegistry.html">MetricsRegistry</a>.
 Your stream task needs to implement <a 
href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a>,
 so that you can get the metrics registry from the <a 
href="../api/javadocs/org/apache/samza/task/TaskContext.html">TaskContext</a>. 
This simple example shows how to count the number of messages processed by your 
task:</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="kd">public</span> <span 
class="kd">class</span> <span class="nc">MyJavaStreamTask</span> <span 
class="kd">implements</span> <span class="n">StreamTask</span><span 
class="o">,</span> <span class="n">InitableTask</span> <span class="o">{</span>
-  <span class="kd">private</span> <span class="n">Counter</span> <span 
class="n">messageCount</span><span class="o">;</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="kd">public</span> <span class="kd">class</span> 
<span class="nc">MyJavaStreamTask</span> <span class="kd">implements</span> 
<span class="nc">StreamTask</span><span class="o">,</span> <span 
class="nc">InitableTask</span> <span class="o">{</span>
+  <span class="kd">private</span> <span class="nc">Counter</span> <span 
class="n">messageCount</span><span class="o">;</span>
 
-  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">init</span><span class="o">(</span><span class="n">Config</span> 
<span class="n">config</span><span class="o">,</span> <span 
class="n">TaskContext</span> <span class="n">context</span><span 
class="o">)</span> <span class="o">{</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">init</span><span class="o">(</span><span class="nc">Config</span> 
<span class="n">config</span><span class="o">,</span> <span 
class="nc">TaskContext</span> <span class="n">context</span><span 
class="o">)</span> <span class="o">{</span>
     <span class="k">this</span><span class="o">.</span><span 
class="na">messageCount</span> <span class="o">=</span> <span 
class="n">context</span>
       <span class="o">.</span><span class="na">getMetricsRegistry</span><span 
class="o">()</span>
-      <span class="o">.</span><span class="na">newCounter</span><span 
class="o">(</span><span class="n">getClass</span><span 
class="o">().</span><span class="na">getName</span><span class="o">(),</span> 
<span class="s">&quot;message-count&quot;</span><span class="o">);</span>
+      <span class="o">.</span><span class="na">newCounter</span><span 
class="o">(</span><span class="n">getClass</span><span 
class="o">().</span><span class="na">getName</span><span class="o">(),</span> 
<span class="s">"message-count"</span><span class="o">);</span>
   <span class="o">}</span>
 
-  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">process</span><span class="o">(</span><span 
class="n">IncomingMessageEnvelope</span> <span class="n">envelope</span><span 
class="o">,</span>
-                      <span class="n">MessageCollector</span> <span 
class="n">collector</span><span class="o">,</span>
-                      <span class="n">TaskCoordinator</span> <span 
class="n">coordinator</span><span class="o">)</span> <span class="o">{</span>
+  <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">process</span><span class="o">(</span><span 
class="nc">IncomingMessageEnvelope</span> <span class="n">envelope</span><span 
class="o">,</span>
+                      <span class="nc">MessageCollector</span> <span 
class="n">collector</span><span class="o">,</span>
+                      <span class="nc">TaskCoordinator</span> <span 
class="n">coordinator</span><span class="o">)</span> <span class="o">{</span>
     <span class="n">messageCount</span><span class="o">.</span><span 
class="na">inc</span><span class="o">();</span>
   <span class="o">}</span>
 <span class="o">}</span></code></pre></figure>
@@ -713,7 +727,7 @@
 
 <p>If you want to report metrics in some other way, e.g. directly to a 
graphing system (without going via Kafka), you can implement a <a 
href="../api/javadocs/org/apache/samza/metrics/MetricsReporterFactory.html">MetricsReporterFactory</a>
 and reference it in your job configuration.</p>
 
-<h2 id="jmx"><a href="jmx.html">JMX &raquo;</a></h2>
+<h2 id="jmx-"><a href="jmx.html">JMX Â»</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/samza-container.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/samza-container.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/samza-container.html 
(original)
+++ samza/site/learn/documentation/latest/container/samza-container.html Wed 
Jan 18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/samza-container">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/samza-container">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/samza-container">1.6.0</a></li>
 
               
@@ -644,32 +658,32 @@
 <p>When a SamzaContainer starts up, it does the following:</p>
 
 <ol>
-<li>Get last checkpointed offset for each input stream partition that it 
consumes</li>
-<li>Create a &ldquo;reader&rdquo; thread for every input stream partition that 
it consumes</li>
-<li>Start metrics reporters to report metrics</li>
-<li>Start a checkpoint timer to save your task&rsquo;s input stream offsets 
every so often</li>
-<li>Start a window timer to trigger your task&rsquo;s <a 
href="../api/javadocs/org/apache/samza/task/WindowableTask.html">window 
method</a>, if it is defined</li>
-<li>Instantiate and initialize your StreamTask once for each input stream 
partition</li>
-<li>Start an event loop that takes messages from the input stream reader 
threads, and gives them to your StreamTasks</li>
-<li>Notify lifecycle listeners during each one of these steps</li>
+  <li>Get last checkpointed offset for each input stream partition that it 
consumes</li>
+  <li>Create a âreaderâ thread for every input stream partition that it 
consumes</li>
+  <li>Start metrics reporters to report metrics</li>
+  <li>Start a checkpoint timer to save your taskâs input stream offsets 
every so often</li>
+  <li>Start a window timer to trigger your taskâs <a 
href="../api/javadocs/org/apache/samza/task/WindowableTask.html">window 
method</a>, if it is defined</li>
+  <li>Instantiate and initialize your StreamTask once for each input stream 
partition</li>
+  <li>Start an event loop that takes messages from the input stream reader 
threads, and gives them to your StreamTasks</li>
+  <li>Notify lifecycle listeners during each one of these steps</li>
 </ol>
 
-<p>Let&rsquo;s start in the middle, with the instantiation of a StreamTask. 
The following sections of the documentation cover the other steps.</p>
+<p>Letâs start in the middle, with the instantiation of a StreamTask. The 
following sections of the documentation cover the other steps.</p>
 
 <h3 id="tasks-and-partitions">Tasks and Partitions</h3>
 
-<p>When the container starts, it creates instances of the <a 
href="../api/overview.html">task class</a> that you&rsquo;ve written. If the 
task class implements the <a 
href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a> 
interface, the SamzaContainer will also call the init() method.</p>
+<p>When the container starts, it creates instances of the <a 
href="../api/overview.html">task class</a> that youâve written. If the task 
class implements the <a 
href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a> 
interface, the SamzaContainer will also call the init() method.</p>
 
-<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span></span><span class="cm">/** Implement this if you want a 
callback when your task starts up. */</span>
+<figure class="highlight"><pre><code class="language-java" 
data-lang="java"><span class="cm">/** Implement this if you want a callback 
when your task starts up. */</span>
 <span class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">InitableTask</span> <span class="o">{</span>
-  <span class="kt">void</span> <span class="nf">init</span><span 
class="o">(</span><span class="n">Config</span> <span 
class="n">config</span><span class="o">,</span> <span 
class="n">TaskContext</span> <span class="n">context</span><span 
class="o">);</span>
+  <span class="kt">void</span> <span class="nf">init</span><span 
class="o">(</span><span class="nc">Config</span> <span 
class="n">config</span><span class="o">,</span> <span 
class="nc">TaskContext</span> <span class="n">context</span><span 
class="o">);</span>
 <span class="o">}</span></code></pre></figure>
 
-<p>By default, how many instances of your task class are created depends on 
the number of partitions in the job&rsquo;s input streams. If your Samza job 
has ten partitions, there will be ten instantiations of your task class: one 
for each partition. The first task instance will receive all messages for 
partition one, the second instance will receive all messages for partition two, 
and so on.</p>
+<p>By default, how many instances of your task class are created depends on 
the number of partitions in the jobâs input streams. If your Samza job has 
ten partitions, there will be ten instantiations of your task class: one for 
each partition. The first task instance will receive all messages for partition 
one, the second instance will receive all messages for partition two, and so 
on.</p>
 
-<p><img 
src="/img/latest/learn/documentation/container/tasks-and-partitions.svg" 
alt="Illustration of tasks consuming partitions" class="diagram-large"></p>
+<p><img 
src="/img/latest/learn/documentation/container/tasks-and-partitions.svg" 
alt="Illustration of tasks consuming partitions" class="diagram-large" /></p>
 
-<p>The number of partitions in the input streams is determined by the systems 
from which you are consuming. For example, if your input system is Kafka, you 
can specify the number of partitions when you create a topic from the command 
line or using the num.partitions in Kafka&rsquo;s server properties file.</p>
+<p>The number of partitions in the input streams is determined by the systems 
from which you are consuming. For example, if your input system is Kafka, you 
can specify the number of partitions when you create a topic from the command 
line or using the num.partitions in Kafkaâs server properties file.</p>
 
 <p>If a Samza job has more than one input stream, the number of task instances 
for the Samza job is the maximum number of partitions across all input streams. 
For example, if a Samza job is reading from PageViewEvent (12 partitions), and 
ServiceMetricEvent (14 partitions), then the Samza job would have 14 task 
instances (numbered 0 through 13). Task instances 12 and 13 only receive events 
from ServiceMetricEvent, because there is no corresponding PageViewEvent 
partition.</p>
 
@@ -683,13 +697,13 @@
 
 <h3 id="containers-and-resource-allocation">Containers and resource 
allocation</h3>
 
-<p>Although the number of task instances is fixed &mdash; determined by the 
number of input partitions &mdash; you can configure how many containers you 
want to use for your job. If you are <a href="../jobs/yarn-jobs.html">using 
YARN</a>, the number of containers determines what CPU and memory resources are 
allocated to your job.</p>
+<p>Although the number of task instances is fixed â determined by the number 
of input partitions â you can configure how many containers you want to use 
for your job. If you are <a href="../jobs/yarn-jobs.html">using YARN</a>, the 
number of containers determines what CPU and memory resources are allocated to 
your job.</p>
 
 <p>If the data volume on your input streams is small, it might be sufficient 
to use just one SamzaContainer. In that case, Samza still creates one task 
instance per input partition, but all those tasks run within the same 
container. At the other extreme, you can create as many containers as you have 
partitions, and Samza will assign one task instance to each container.</p>
 
-<p>Each SamzaContainer is designed to use one CPU core, so it uses a <a 
href="event-loop.html">single-threaded event loop</a> for execution. It&rsquo;s 
not advisable to create your own threads within a SamzaContainer. If you need 
more parallelism, please configure your job to use more containers.</p>
+<p>Each SamzaContainer is designed to use one CPU core, so it uses a <a 
href="event-loop.html">single-threaded event loop</a> for execution. Itâs not 
advisable to create your own threads within a SamzaContainer. If you need more 
parallelism, please configure your job to use more containers.</p>
 
-<p>Any <a href="state-management.html">state</a> in your job belongs to a task 
instance, not to a container. This is a key design decision for Samza&rsquo;s 
scalability: as your job&rsquo;s resource requirements grow and shrink, you can 
simply increase or decrease the number of containers, but the number of task 
instances remains unchanged. As you scale up or down, the same state remains 
attached to each task instance. Task instances may be moved from one container 
to another, and any persistent state managed by Samza will be moved with it. 
This allows the job&rsquo;s processing semantics to remain unchanged, even as 
you change the job&rsquo;s parallelism.</p>
+<p>Any <a href="state-management.html">state</a> in your job belongs to a task 
instance, not to a container. This is a key design decision for Samzaâs 
scalability: as your jobâs resource requirements grow and shrink, you can 
simply increase or decrease the number of containers, but the number of task 
instances remains unchanged. As you scale up or down, the same state remains 
attached to each task instance. Task instances may be moved from one container 
to another, and any persistent state managed by Samza will be moved with it. 
This allows the jobâs processing semantics to remain unchanged, even as you 
change the jobâs parallelism.</p>
 
 <h3 id="joining-multiple-input-streams">Joining multiple input streams</h3>
 
@@ -720,17 +734,17 @@
 
 <p>Thus, if you want two events in different streams to be processed by the 
same task instance, you need to ensure they are sent to the same partition 
number. You can achieve this by using the same partitioning key when <a 
href="../api/overview.html">sending the messages</a>. Joining streams is 
discussed in detail in the <a href="state-management.html">state management</a> 
section.</p>
 
-<p>There is one caveat in all of this: Samza currently assumes that a 
stream&rsquo;s partition count will never change. Partition splitting or 
repartitioning is not supported. If an input stream has N partitions, it is 
expected that it has always had, and will always have N partitions. If you want 
to re-partition a stream, you can write a job that reads messages from the 
stream, and writes them out to a new stream with the required number of 
partitions. For example, you could read messages from PageViewEvent, and write 
them to PageViewEventRepartition.</p>
+<p>There is one caveat in all of this: Samza currently assumes that a 
streamâs partition count will never change. Partition splitting or 
repartitioning is not supported. If an input stream has N partitions, it is 
expected that it has always had, and will always have N partitions. If you want 
to re-partition a stream, you can write a job that reads messages from the 
stream, and writes them out to a new stream with the required number of 
partitions. For example, you could read messages from PageViewEvent, and write 
them to PageViewEventRepartition.</p>
 
 <h3 id="broadcast-streams">Broadcast Streams</h3>
 
 <p>After 0.10.0, Samza supports broadcast streams. You can assign partitions 
from some streams to all the tasks, by appending the hash tag, and the 
partition number or the partition number range. For example, you want all the 
tasks can consume partition 0 and 1 from a stream called broadcast-stream-1, 
and partition 2 from a stream called broadcast-stream-2. You now can 
configure:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span 
class="na">task.broadcast.inputs</span><span class="o">=</span><span 
class="s">yourSystem.broadcast-stream-1#[0-1], 
yourSystem.broadcast-stream-2#2</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties">task.broadcast.inputs=yourSystem.broadcast-stream-1#[0-1],
 yourSystem.broadcast-stream-2#2</code></pre></figure>
 
-<p>If you use &ldquo;[]&rdquo;, you are specifying a range for partitions.</p>
+<p>If you use â[]â, you are specifying a range for partitions.</p>
 
-<h2 id="streams"><a href="streams.html">Streams &raquo;</a></h2>
+<h2 id="streams-"><a href="streams.html">Streams Â»</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/container/serialization.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/container/serialization.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/container/serialization.html 
(original)
+++ samza/site/learn/documentation/latest/container/serialization.html Wed Jan 
18 19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/container/serialization">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/container/serialization">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/container/serialization">1.6.0</a></li>
 
               
@@ -642,38 +656,38 @@
 <p>Every message that is read from or written to a <a 
href="streams.html">stream</a> or a <a href="state-management.html">persistent 
state store</a> needs to eventually be serialized to bytes (which are sent over 
the network or written to disk). There are various places where that 
serialization and deserialization can happen:</p>
 
 <ol>
-<li>In the client library: for example, the library for publishing to Kafka 
and consuming from Kafka supports pluggable serialization.</li>
-<li>In the task implementation: your <a href="../api/overview.html">process 
method</a> can use raw byte arrays as inputs and outputs, and do any parsing 
and serialization itself.</li>
-<li>Between the two: Samza provides a layer of serializers and deserializers, 
or <em>serdes</em> for short.</li>
+  <li>In the client library: for example, the library for publishing to Kafka 
and consuming from Kafka supports pluggable serialization.</li>
+  <li>In the task implementation: your <a href="../api/overview.html">process 
method</a> can use raw byte arrays as inputs and outputs, and do any parsing 
and serialization itself.</li>
+  <li>Between the two: Samza provides a layer of serializers and 
deserializers, or <em>serdes</em> for short.</li>
 </ol>
 
-<p>You can use whatever makes sense for your job; Samza doesn&rsquo;t impose 
any particular data model or serialization scheme on you. However, the cleanest 
solution is usually to use Samza&rsquo;s serde layer. The following 
configuration example shows how to use it.</p>
+<p>You can use whatever makes sense for your job; Samza doesnât impose any 
particular data model or serialization scheme on you. However, the cleanest 
solution is usually to use Samzaâs serde layer. The following configuration 
example shows how to use it.</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span class="c"># Define a system called 
&quot;kafka&quot;</span>
-<span class="na">systems.kafka.samza.factory</span><span 
class="o">=</span><span 
class="s">org.apache.samza.system.kafka.KafkaSystemFactory</span>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"># Define a system called "kafka"
+systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
 
-<span class="c"># The job is going to consume a topic called 
&quot;PageViewEvent&quot; from the &quot;kafka&quot; system</span>
-<span class="na">task.inputs</span><span class="o">=</span><span 
class="s">kafka.PageViewEvent</span>
+# The job is going to consume a topic called "PageViewEvent" from the "kafka" 
system
+task.inputs=kafka.PageViewEvent
 
-<span class="c"># Define a serde called &quot;json&quot; which 
parses/serializes JSON objects</span>
-<span class="na">serializers.registry.json.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.serializers.JsonSerdeFactory</span>
+# Define a serde called "json" which parses/serializes JSON objects
+serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
 
-<span class="c"># Define a serde called &quot;integer&quot; which encodes an 
integer as 4 binary bytes (big-endian)</span>
-<span class="na">serializers.registry.integer.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.serializers.IntegerSerdeFactory</span>
+# Define a serde called "integer" which encodes an integer as 4 binary bytes 
(big-endian)
+serializers.registry.integer.class=org.apache.samza.serializers.IntegerSerdeFactory
 
-<span class="c"># For messages in the &quot;PageViewEvent&quot; topic, the key 
(the ID of the user viewing the page)</span>
-<span class="c"># is encoded as a binary integer, and the message is encoded 
as JSON.</span>
-<span 
class="na">systems.kafka.streams.PageViewEvent.samza.key.serde</span><span 
class="o">=</span><span class="s">integer</span>
-<span 
class="na">systems.kafka.streams.PageViewEvent.samza.msg.serde</span><span 
class="o">=</span><span class="s">json</span>
+# For messages in the "PageViewEvent" topic, the key (the ID of the user 
viewing the page)
+# is encoded as a binary integer, and the message is encoded as JSON.
+systems.kafka.streams.PageViewEvent.samza.key.serde=integer
+systems.kafka.streams.PageViewEvent.samza.msg.serde=json
 
-<span class="c"># Define a key-value store which stores the most recent page 
view for each user ID.</span>
-<span class="c"># Again, the key is an integer user ID, and the value is 
JSON.</span>
-<span class="na">stores.LastPageViewPerUser.factory</span><span 
class="o">=</span><span 
class="s">org.apache.samza.storage.kv.KeyValueStorageEngineFactory</span>
-<span class="na">stores.LastPageViewPerUser.changelog</span><span 
class="o">=</span><span class="s">kafka.last-page-view-per-user</span>
-<span class="na">stores.LastPageViewPerUser.key.serde</span><span 
class="o">=</span><span class="s">integer</span>
-<span class="na">stores.LastPageViewPerUser.msg.serde</span><span 
class="o">=</span><span class="s">json</span></code></pre></figure>
+# Define a key-value store which stores the most recent page view for each 
user ID.
+# Again, the key is an integer user ID, and the value is JSON.
+stores.LastPageViewPerUser.factory=org.apache.samza.storage.kv.KeyValueStorageEngineFactory
+stores.LastPageViewPerUser.changelog=kafka.last-page-view-per-user
+stores.LastPageViewPerUser.key.serde=integer
+stores.LastPageViewPerUser.msg.serde=json</code></pre></figure>
 
-<p>Each serde is defined with a factory class. Samza comes with several 
builtin serdes for UTF-8 strings, binary-encoded integers, JSON and more. The 
following is a comprehensive list of supported serdes in Samza.
+<p>Each serde is defined with a factory class. Samza comes with several 
builtin serdes for UTF-8 strings, binary-encoded integers, JSON and more. The 
following is a comprehensive list of supported serdes in Samza.</p>
 <style>
             table th, table td {
                 text-align: left;
@@ -683,16 +697,17 @@
                 border-top: 1px solid #ccc;
                 border-left: 0;
                 border-right: 0;
-            }</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>        table td.property, table td.default {
-            white-space: nowrap;
-        }
-
-        table th {
-            background-color: #eee;
-        }
-</code></pre></div>
-<p></style>
+            }
+
+            table td.property, table td.default {
+                white-space: nowrap;
+            }
+
+            table th {
+                background-color: #eee;
+            }
+</style>
+
 <table>
     <tr>
         <th> Serde Name</th>
@@ -726,17 +741,17 @@
         <td> bytebuffer </td>
         <td> Byte Buffer </td>
     </tr>
-</table></p>
+</table>
 
 <p>You can also create your own serializer by implementing the <a 
href="../api/javadocs/org/apache/samza/serializers/SerdeFactory.html">SerdeFactory</a>
 interface.</p>
 
-<p>The name you give to a serde (such as &ldquo;json&rdquo; and 
&ldquo;integer&rdquo; in the example above) is only for convenience in your job 
configuration; you can choose whatever name you like. For each stream and each 
state store, you can use the serde name to declare how messages should be 
serialized and deserialized.</p>
+<p>The name you give to a serde (such as âjsonâ and âintegerâ in the 
example above) is only for convenience in your job configuration; you can 
choose whatever name you like. For each stream and each state store, you can 
use the serde name to declare how messages should be serialized and 
deserialized.</p>
 
-<p>If you don&rsquo;t declare a serde, Samza simply passes objects through 
between your task instance and the system stream. In that case your task needs 
to send and receive whatever type of object the underlying client library 
uses.</p>
+<p>If you donât declare a serde, Samza simply passes objects through between 
your task instance and the system stream. In that case your task needs to send 
and receive whatever type of object the underlying client library uses.</p>
 
-<p>All the Samza APIs for sending and receiving messages are typed as 
<em>Object</em>. This means that you have to cast messages to the correct type 
before you can use them. It&rsquo;s a little bit more code, but it has the 
advantage that Samza is not restricted to any particular data model.</p>
+<p>All the Samza APIs for sending and receiving messages are typed as 
<em>Object</em>. This means that you have to cast messages to the correct type 
before you can use them. Itâs a little bit more code, but it has the 
advantage that Samza is not restricted to any particular data model.</p>
 
-<h2 id="checkpointing"><a href="checkpointing.html">Checkpointing 
&raquo;</a></h2>
+<h2 id="checkpointing-"><a href="checkpointing.html">Checkpointing Â»</a></h2>
 
            
         </div>

svn commit: r1906774 [31/49] - in /samza/site: ./ archive/ blog/ case-studies/ community/ contribute/ img/latest/learn/documentation/api/ learn/documentation/latest/ learn/documentation/latest/api/ learn/documentation/latest/api/javadocs/ learn/documen...

Reply via email to