documen...

ajothomas Wed, 18 Jan 2023 11:34:08 -0800

Modified: samza/site/learn/documentation/latest/jobs/logging.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/jobs/logging.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/jobs/logging.html (original)
+++ samza/site/learn/documentation/latest/jobs/logging.html Wed Jan 18 19:33:25 
2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/jobs/logging">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/jobs/logging">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/jobs/logging">1.6.0</a></li>
 
               
@@ -640,16 +654,17 @@
 -->
 
 <p>Samza uses <a href="http://www.slf4j.org/";>SLF4J</a> for all of its 
logging. By default, Samza only depends on slf4j-api, so it can work for 
whichever underlying logging platform you wish to use. Make sure you are 
depending on slf4j-api version &gt;= 1.7.16. You simply need to add the SLF4J 
bridge corresponding to the logging implementation chosen. Samza logging has 
been thoroughly tested against Log4j and Log4j2. Samza provides bundled modules 
for each of the Log4j versions along with additional functionality.</p>
-
 <h3 id="logging-with-log4j">Logging with Log4j</h3>
 
-<p>To use Samza with <a href="http://logging.apache.org/log4j/1.2/";>log4j</a>, 
you just need to make sure the following dependencies are present in your 
SamzaContainerâs classpath:
--   samza-log4j
--   slf4j-log4j12</p>
+<p>To use Samza with <a href="http://logging.apache.org/log4j/1.2/";>log4j</a>, 
you just need to make sure the following dependencies are present in your 
SamzaContainerâs classpath:</p>
+<ul>
+  <li>samza-log4j</li>
+  <li>slf4j-log4j12</li>
+</ul>
 
-<p>In Maven, this can be done by adding the following dependencies to your 
Samza package project&rsquo;s pom.xml:</p>
+<p>In Maven, this can be done by adding the following dependencies to your 
Samza package projectâs pom.xml:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;dependency&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;dependency&gt;</span>
   <span class="nt">&lt;setId&gt;</span>org.slf4j<span 
class="nt">&lt;/setId&gt;</span>
   <span class="nt">&lt;artifactId&gt;</span>slf4j-log4j12<span 
class="nt">&lt;/artifactId&gt;</span>
   <span class="nt">&lt;scope&gt;</span>runtime<span 
class="nt">&lt;/scope&gt;</span>
@@ -662,64 +677,66 @@
   <span class="nt">&lt;version&gt;</span>0.14.0<span 
class="nt">&lt;/version&gt;</span>
 <span class="nt">&lt;/dependency&gt;</span></code></pre></figure>
 
-<p>If you&rsquo;re not using Maven, just make sure that both these 
dependencies end up in your Samza package&rsquo;s lib directory.</p>
+<p>If youâre not using Maven, just make sure that both these dependencies 
end up in your Samza packageâs lib directory.</p>
 
-<p>Next, you need to make sure that these dependencies are also listed in your 
Samza project&rsquo;s build.gradle:</p>
+<p>Next, you need to make sure that these dependencies are also listed in your 
Samza projectâs build.gradle:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>    compile<span class="o">(</span>group: <span 
class="s1">&#39;org.slf4j&#39;</span>, name: <span 
class="s1">&#39;slf4j-log4j12&#39;</span>, version: <span 
class="s2">&quot;</span><span class="nv">$SLF4J_VERSION</span><span 
class="s2">&quot;</span><span class="o">)</span>
-    runtime<span class="o">(</span>group: <span 
class="s1">&#39;org.apache.samza&#39;</span>, name: <span 
class="s1">&#39;samza-log4j&#39;</span>, version: <span 
class="s2">&quot;</span><span class="nv">$SAMZA_VERSION</span><span 
class="s2">&quot;</span><span class="o">)</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash">   
 compile<span class="o">(</span>group: <span class="s1">'org.slf4j'</span>, 
name: <span class="s1">'slf4j-log4j12'</span>, version: <span 
class="s2">"</span><span class="nv">$SLF4J_VERSION</span><span 
class="s2">"</span><span class="o">)</span>
+    runtime<span class="o">(</span>group: <span 
class="s1">'org.apache.samza'</span>, name: <span 
class="s1">'samza-log4j'</span>, version: <span class="s2">"</span><span 
class="nv">$SAMZA_VERSION</span><span class="s2">"</span><span 
class="o">)</span></code></pre></figure>
 
 <p>Note: Please make sure that no dependencies of Log4j2 are present in the 
classpath while working with Log4j.</p>
 
 <h4 id="log4j-configuration">Log4j configuration</h4>
 
-<p>Please ensure you have log4j.xml in your <a href="packaging.html">Samza 
package&rsquo;s</a> lib directory. For example, in hello-samza application, the 
following lines are added to src.xml to ensure log4j.xml is present in the lib 
directory:</p>
+<p>Please ensure you have log4j.xml in your <a href="packaging.html">Samza 
packageâs</a> lib directory. For example, in hello-samza application, the 
following lines are added to src.xml to ensure log4j.xml is present in the lib 
directory:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;files&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;files&gt;</span>
   <span class="nt">&lt;file&gt;</span>
     <span 
class="nt">&lt;source&gt;</span>${basedir}/src/main/resources/log4j.xml<span 
class="nt">&lt;/source&gt;</span>
     <span class="nt">&lt;outputDirectory&gt;</span>lib<span 
class="nt">&lt;/outputDirectory&gt;</span>
   <span class="nt">&lt;/file&gt;</span>
 <span class="nt">&lt;/files&gt;</span></code></pre></figure>
 
-<p>Samza&rsquo;s <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j.xml exists in your <a 
href="packaging.html">Samza package&rsquo;s</a> lib directory.</p>
+<p>Samzaâs <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j.xml exists in your <a 
href="packaging.html">Samza packageâs</a> lib directory.</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-Dlog4j.configuration<span 
class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j.xml</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-Dlog4j</span>.configuration<span 
class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j.xml</code></pre></figure>
 
 <p>The <a href="packaging.html">run-class.sh</a> script will also set the 
following Java system properties:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-Dsamza.log.dir<span class="o">=</span><span 
class="nv">$SAMZA_LOG_DIR</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-Dsamza</span>.log.dir<span 
class="o">=</span><span class="nv">$SAMZA_LOG_DIR</span></code></pre></figure>
 
 <p>The <a href="packaging.html">run-container.sh</a> will also set:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-Dsamza.container.id<span class="o">=</span><span 
class="nv">$SAMZA_CONTAINER_ID</span> -Dsamza.container.name<span 
class="o">=</span>samza-container-<span 
class="nv">$SAMZA_CONTAINER_ID</span><span 
class="s2">&quot;</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-Dsamza</span>.container.id<span 
class="o">=</span><span class="nv">$SAMZA_CONTAINER_ID</span> <span 
class="nt">-Dsamza</span>.container.name<span 
class="o">=</span>samza-container-<span 
class="nv">$SAMZA_CONTAINER_ID</span><span 
class="s2">"</span></code></pre></figure>
 
 <p>Likewise, <a href="packaging.html">run-am.sh</a> sets:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-Dsamza.container.name<span 
class="o">=</span>samza-application-master</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-Dsamza</span>.container.name<span 
class="o">=</span>samza-application-master</code></pre></figure>
 
-<p>These settings are very useful if you&rsquo;re using a file-based appender. 
For example, you can use a rolling appender to separate log file when it 
reaches certain size by configuring log4j.xml like this:</p>
+<p>These settings are very useful if youâre using a file-based appender. For 
example, you can use a rolling appender to separate log file when it reaches 
certain size by configuring log4j.xml like this:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">&quot;RollingAppender&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.log4j.RollingFileAppender&quot;</span><span 
class="nt">&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;File&quot;</span> <span class="na">value=</span><span 
class="s">&quot;${samza.log.dir}/${samza.container.name}.log&quot;</span> <span 
class="nt">/&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;MaxFileSize&quot;</span> <span class="na">value=</span><span 
class="s">&quot;256MB&quot;</span> <span class="nt">/&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;MaxBackupIndex&quot;</span> <span class="na">value=</span><span 
class="s">&quot;20&quot;</span> <span class="nt">/&gt;</span>
-   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">&quot;org.apache.log4j.PatternLayout&quot;</span><span 
class="nt">&gt;</span>
-    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;ConversionPattern&quot;</span> <span 
class="na">value=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n&quot;</span> <span class="nt">/&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">"RollingAppender"</span> <span 
class="na">class=</span><span 
class="s">"org.apache.log4j.RollingFileAppender"</span><span 
class="nt">&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"File"</span> <span class="na">value=</span><span 
class="s">"${samza.log.dir}/${samza.container.name}.log"</span> <span 
class="nt">/&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"MaxFileSize"</span> <span class="na">value=</span><span 
class="s">"256MB"</span> <span class="nt">/&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"MaxBackupIndex"</span> <span class="na">value=</span><span 
class="s">"20"</span> <span class="nt">/&gt;</span>
+   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">"org.apache.log4j.PatternLayout"</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"ConversionPattern"</span> <span class="na">value=</span><span 
class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} [%p] %m%n"</span> <span 
class="nt">/&gt;</span>
    <span class="nt">&lt;/layout&gt;</span>
 <span class="nt">&lt;/appender&gt;</span></code></pre></figure>
 
-<p>Setting up a file-based appender is recommended as a better alternative to 
using standard out. Standard out log files (see below) don&rsquo;t roll, and 
can get quite large if used for logging.</p>
+<p>Setting up a file-based appender is recommended as a better alternative to 
using standard out. Standard out log files (see below) donât roll, and can 
get quite large if used for logging.</p>
 
 <h3 id="logging-with-log4j2">Logging with Log4j2</h3>
 
-<p>To use Samza with <a 
href="https://logging.apache.org/log4j/2.x/";>log4j2</a>, the following 
dependencies need to be present in SamzaContainerâs classpath:
--   samza-log4j2
--   log4j-slf4j-impl</p>
+<p>To use Samza with <a 
href="https://logging.apache.org/log4j/2.x/";>log4j2</a>, the following 
dependencies need to be present in SamzaContainerâs classpath:</p>
+<ul>
+  <li>samza-log4j2</li>
+  <li>log4j-slf4j-impl</li>
+</ul>
 
-<p>In Maven, these can be done by adding the following dependencies to your 
Samza project&rsquo;s pom.xml:</p>
+<p>In Maven, these can be done by adding the following dependencies to your 
Samza projectâs pom.xml:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;dependency&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;dependency&gt;</span>
   <span class="nt">&lt;groupId&gt;</span>org.apache.logging.log4j<span 
class="nt">&lt;/groupId&gt;</span>
   <span class="nt">&lt;artifactId&gt;</span>log4j-slf4j-impl<span 
class="nt">&lt;/artifactId&gt;</span>
   <span class="nt">&lt;version&gt;</span>2.11<span 
class="nt">&lt;/version&gt;</span>
@@ -740,61 +757,61 @@
 <span class="nt">&lt;/dependency&gt;</span></code></pre></figure>
 
 <p>If youâre not using Maven, please make sure both the above dependencies 
end up in your Samza packageâs lib directory.
-Also, make sure there isn&rsquo;t any dependency on slf4j-log4j12 library 
while using Log4j2. </p>
+Also, make sure there isnât any dependency on slf4j-log4j12 library while 
using Log4j2.</p>
 
-<p>Next, you need to make sure that these dependencies are also listed in your 
Samza project&rsquo;s build.gradle:</p>
+<p>Next, you need to make sure that these dependencies are also listed in your 
Samza projectâs build.gradle:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>    compile<span class="o">(</span>group: <span 
class="s1">&#39;org.apache.logging.log4j&#39;</span>, name: <span 
class="s1">&#39;log4j-slf4j-impl&#39;</span>, version: <span 
class="s2">&quot;2.11.0&quot;</span><span class="o">)</span>
-    runtime<span class="o">(</span>group: <span 
class="s1">&#39;org.apache.samza&#39;</span>, name: <span 
class="s1">&#39;samza-log4j2&#39;</span>, version: <span 
class="s2">&quot;</span><span class="nv">$SAMZA_VERSION</span><span 
class="s2">&quot;</span><span class="o">)</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash">   
 compile<span class="o">(</span>group: <span 
class="s1">'org.apache.logging.log4j'</span>, name: <span 
class="s1">'log4j-slf4j-impl'</span>, version: <span 
class="s2">"2.11.0"</span><span class="o">)</span>
+    runtime<span class="o">(</span>group: <span 
class="s1">'org.apache.samza'</span>, name: <span 
class="s1">'samza-log4j2'</span>, version: <span class="s2">"</span><span 
class="nv">$SAMZA_VERSION</span><span class="s2">"</span><span 
class="o">)</span></code></pre></figure>
 
 <h4 id="log4j2-configuration">Log4j2 configuration</h4>
 
-<p>Please ensure you have log4j2.xml in your <a href="packaging.html">Samza 
package&rsquo;s</a> lib directory. For example, in hello-samza application, the 
following lines are added to src.xml to ensure log4j2.xml is present in the lib 
directory:</p>
+<p>Please ensure you have log4j2.xml in your <a href="packaging.html">Samza 
packageâs</a> lib directory. For example, in hello-samza application, the 
following lines are added to src.xml to ensure log4j2.xml is present in the lib 
directory:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;files&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;files&gt;</span>
   <span class="nt">&lt;file&gt;</span>
     <span 
class="nt">&lt;source&gt;</span>${basedir}/src/main/resources/log4j2.xml<span 
class="nt">&lt;/source&gt;</span>
     <span class="nt">&lt;outputDirectory&gt;</span>lib<span 
class="nt">&lt;/outputDirectory&gt;</span>
   <span class="nt">&lt;/file&gt;</span>
 <span class="nt">&lt;/files&gt;</span></code></pre></figure>
 
-<p>Samza&rsquo;s <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j2.xml exists in your lib 
directory.</p>
+<p>Samzaâs <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j2.xml exists in your lib 
directory.</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-Dlog4j.configurationFile<span 
class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j2.xml</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-Dlog4j</span>.configurationFile<span 
class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j2.xml</code></pre></figure>
 
 <p>Rest all of the system properties will be set exactly like in the case of 
log4j, stated above.</p>
 
 <p>Sample log4j2.xml:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="cp">&lt;?xml version=&quot;1.0&quot; 
encoding=&quot;UTF-8&quot; ?&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="cp">&lt;?xml version="1.0" encoding="UTF-8" 
?&gt;</span>
 
 <span class="nt">&lt;Configuration&gt;</span>
 
   <span class="nt">&lt;Appenders&gt;</span>
-    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">&quot;RollingFile&quot;</span> <span class="na">fileName=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-log4j2.log&quot;</span>
 <span class="na">filePattern=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-%d{MM-dd-yyyy}-log4j2-%i.log.gz&quot;</span><span
 class="nt">&gt;</span>
-      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} 
[%t] %c{1} [%p] %m%n&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">"RollingFile"</span> <span class="na">fileName=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-log4j2.log"</span> 
<span class="na">filePattern=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-%d{MM-dd-yyyy}-log4j2-%i.log.gz"</span><span
 class="nt">&gt;</span>
+      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n"</span><span class="nt">/&gt;</span>
       <span class="nt">&lt;Policies&gt;</span>
-        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">&quot;256MB&quot;</span> <span 
class="nt">/&gt;</span>
+        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">"256MB"</span> <span 
class="nt">/&gt;</span>
       <span class="nt">&lt;/Policies&gt;</span>
-      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">&quot;20&quot;</span><span 
class="nt">/&gt;</span>
+      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">"20"</span><span class="nt">/&gt;</span>
     <span class="nt">&lt;/RollingFile&gt;</span>
 
-    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">&quot;StartupAppender&quot;</span> <span 
class="na">fileName=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-startup-log4j2.log&quot;</span>
 <span class="na">filePattern=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-startup-%d{MM-dd-yyyy}-log4j2-%i.log.gz&quot;</span><span
 class="nt">&gt;</span>
-      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} 
[%t] %c{1} [%p] %m%n&quot;</span><span class="nt">/&gt;</span>
+    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">"StartupAppender"</span> <span class="na">fileName=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-startup-log4j2.log"</span>
 <span class="na">filePattern=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-startup-%d{MM-dd-yyyy}-log4j2-%i.log.gz"</span><span
 class="nt">&gt;</span>
+      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n"</span><span class="nt">/&gt;</span>
       <span class="nt">&lt;Policies&gt;</span>
-        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">&quot;256MB&quot;</span> <span 
class="nt">/&gt;</span>
+        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">"256MB"</span> <span 
class="nt">/&gt;</span>
       <span class="nt">&lt;/Policies&gt;</span>
-      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">&quot;1&quot;</span><span 
class="nt">/&gt;</span>
+      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">"1"</span><span class="nt">/&gt;</span>
     <span class="nt">&lt;/RollingFile&gt;</span>
   <span class="nt">&lt;/Appenders&gt;</span>
 
   <span class="nt">&lt;Loggers&gt;</span>
-    <span class="nt">&lt;Logger</span> <span class="na">name=</span><span 
class="s">&quot;STARTUP_LOGGER&quot;</span> <span class="na">level=</span><span 
class="s">&quot;info&quot;</span> <span class="na">additivity=</span><span 
class="s">&quot;false&quot;</span><span class="nt">&gt;</span>
-      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">&quot;StartupAppender&quot;</span><span 
class="nt">/&gt;</span>
+    <span class="nt">&lt;Logger</span> <span class="na">name=</span><span 
class="s">"STARTUP_LOGGER"</span> <span class="na">level=</span><span 
class="s">"info"</span> <span class="na">additivity=</span><span 
class="s">"false"</span><span class="nt">&gt;</span>
+      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">"StartupAppender"</span><span 
class="nt">/&gt;</span>
     <span class="nt">&lt;/Logger&gt;</span>
 
-    <span class="nt">&lt;Root</span> <span class="na">level=</span><span 
class="s">&quot;info&quot;</span><span class="nt">&gt;</span>
-      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">&quot;RollingFile&quot;</span><span 
class="nt">/&gt;</span>
+    <span class="nt">&lt;Root</span> <span class="na">level=</span><span 
class="s">"info"</span><span class="nt">&gt;</span>
+      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">"RollingFile"</span><span 
class="nt">/&gt;</span>
     <span class="nt">&lt;/Root&gt;</span>
   <span class="nt">&lt;/Loggers&gt;</span>
 
@@ -802,65 +819,63 @@ Also, make sure there isn&rsquo;t any de
 
 <h4 id="porting-from-log4j-to-log4j2">Porting from Log4j to Log4j2</h4>
 
-<p>If you are already using log4j and want to upgrade to using log4j2, 
following are the changes you will need to make in your job:
--   Clean your lib directory. This will be rebuilt with new dependency JARs 
and xml files.</p>
-
+<p>If you are already using log4j and want to upgrade to using log4j2, 
following are the changes you will need to make in your job:</p>
 <ul>
-<li>  Replace log4jâs dependencies with log4j2âs in your 
pom.xml/build.gradle and src.xml as mentioned above. Please ensure that none of 
log4jâs dependencies remain in pom.xml/build.gradle</li>
-<li>  Create a log4j2.xml to match your existing log4j.xml file. </li>
-<li>  Rebuild your application</li>
+  <li>
+    <p>Clean your lib directory. This will be rebuilt with new dependency JARs 
and xml files.</p>
+  </li>
+  <li>Replace log4jâs dependencies with log4j2âs in your 
pom.xml/build.gradle and src.xml as mentioned above. Please ensure that none of 
log4jâs dependencies remain in pom.xml/build.gradle</li>
+  <li>Create a log4j2.xml to match your existing log4j.xml file.</li>
+  <li>Rebuild your application</li>
 </ul>
 
 <p>NOTE: Please ensure that your classpath does not contain dependencies for 
both log4j and log4j2, as this might cause the application logging to not work 
correctly. For example, we need to exclude the slf4j-log4j12 dependency from 
the classpath for logging with log4j2 to work correctly.</p>
 
 <h4 id="startup-logger">Startup logger</h4>
-
-<p>When using a rolling file appender, it is common for a long-running job to 
exceed the max file size and count. In such cases, the beginning of the logs 
will be lost. Since the beginning of the logs include some of the most critical 
information like configuration, it is important to not lose this information. 
To address this issue, Samza logs this critical information to a &ldquo;startup 
logger&rdquo; in addition to the normal logger. </p>
+<p>When using a rolling file appender, it is common for a long-running job to 
exceed the max file size and count. In such cases, the beginning of the logs 
will be lost. Since the beginning of the logs include some of the most critical 
information like configuration, it is important to not lose this information. 
To address this issue, Samza logs this critical information to a âstartup 
loggerâ in addition to the normal logger.</p>
 
 <h5 id="log4j">Log4j:</h5>
-
 <p>You can write these log messages to a separate, finite file by including 
the snippet below in your log4j.xml.</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">&quot;StartupAppender&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.log4j.RollingFileAppender&quot;</span><span 
class="nt">&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;File&quot;</span> <span class="na">value=</span><span 
class="s">&quot;${samza.log.dir}/${samza.container.name}-startup.log&quot;</span>
 <span class="nt">/&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;MaxFileSize&quot;</span> <span class="na">value=</span><span 
class="s">&quot;256MB&quot;</span> <span class="nt">/&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;MaxBackupIndex&quot;</span> <span class="na">value=</span><span 
class="s">&quot;1&quot;</span> <span class="nt">/&gt;</span>
-   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">&quot;org.apache.log4j.PatternLayout&quot;</span><span 
class="nt">&gt;</span>
-    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;ConversionPattern&quot;</span> <span 
class="na">value=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n&quot;</span> <span class="nt">/&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">"StartupAppender"</span> <span 
class="na">class=</span><span 
class="s">"org.apache.log4j.RollingFileAppender"</span><span 
class="nt">&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"File"</span> <span class="na">value=</span><span 
class="s">"${samza.log.dir}/${samza.container.name}-startup.log"</span> <span 
class="nt">/&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"MaxFileSize"</span> <span class="na">value=</span><span 
class="s">"256MB"</span> <span class="nt">/&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"MaxBackupIndex"</span> <span class="na">value=</span><span 
class="s">"1"</span> <span class="nt">/&gt;</span>
+   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">"org.apache.log4j.PatternLayout"</span><span class="nt">&gt;</span>
+    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"ConversionPattern"</span> <span class="na">value=</span><span 
class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} [%p] %m%n"</span> <span 
class="nt">/&gt;</span>
    <span class="nt">&lt;/layout&gt;</span>
 <span class="nt">&lt;/appender&gt;</span>
-<span class="nt">&lt;logger</span> <span class="na">name=</span><span 
class="s">&quot;STARTUP_LOGGER&quot;</span> <span 
class="na">additivity=</span><span class="s">&quot;false&quot;</span><span 
class="nt">&gt;</span>
-   <span class="nt">&lt;level</span> <span class="na">value=</span><span 
class="s">&quot;info&quot;</span> <span class="nt">/&gt;</span>
-   <span class="nt">&lt;appender-ref</span> <span class="na">ref=</span><span 
class="s">&quot;StartupAppender&quot;</span><span class="nt">/&gt;</span>
+<span class="nt">&lt;logger</span> <span class="na">name=</span><span 
class="s">"STARTUP_LOGGER"</span> <span class="na">additivity=</span><span 
class="s">"false"</span><span class="nt">&gt;</span>
+   <span class="nt">&lt;level</span> <span class="na">value=</span><span 
class="s">"info"</span> <span class="nt">/&gt;</span>
+   <span class="nt">&lt;appender-ref</span> <span class="na">ref=</span><span 
class="s">"StartupAppender"</span><span class="nt">/&gt;</span>
 <span class="nt">&lt;/logger&gt;</span></code></pre></figure>
 
 <h5 id="log4j2">Log4j2:</h5>
+<p>This can be done in a similar way for log4j2.xml using its defined syntax 
for xml files.</p>
 
-<p>This can be done in a similar way for log4j2.xml using its defined syntax 
for xml files. </p>
-
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span>  <span class="nt">&lt;Appenders&gt;</span>
-    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">&quot;StartupAppender&quot;</span> <span 
class="na">fileName=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-startup-log4j2.log&quot;</span>
 <span class="na">filePattern=</span><span 
class="s">&quot;${sys:samza.log.dir}/${sys:samza.container.name}-startup-%d{MM-dd-yyyy}-log4j2-%i.log.gz&quot;</span><span
 class="nt">&gt;</span>
-      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} 
[%t] %c{1} [%p] %m%n&quot;</span><span class="nt">/&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" data-lang="xml">  
<span class="nt">&lt;Appenders&gt;</span>
+    <span class="nt">&lt;RollingFile</span> <span class="na">name=</span><span 
class="s">"StartupAppender"</span> <span class="na">fileName=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-startup-log4j2.log"</span>
 <span class="na">filePattern=</span><span 
class="s">"${sys:samza.log.dir}/${sys:samza.container.name}-startup-%d{MM-dd-yyyy}-log4j2-%i.log.gz"</span><span
 class="nt">&gt;</span>
+      <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n"</span><span class="nt">/&gt;</span>
       <span class="nt">&lt;Policies&gt;</span>
-        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">&quot;256MB&quot;</span> <span 
class="nt">/&gt;</span>
+        <span class="nt">&lt;SizeBasedTriggeringPolicy</span> <span 
class="na">size=</span><span class="s">"256MB"</span> <span 
class="nt">/&gt;</span>
       <span class="nt">&lt;/Policies&gt;</span>
-      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">&quot;1&quot;</span><span 
class="nt">/&gt;</span>
+      <span class="nt">&lt;DefaultRolloverStrategy</span> <span 
class="na">max=</span><span class="s">"1"</span><span class="nt">/&gt;</span>
     <span class="nt">&lt;/RollingFile&gt;</span>
   <span class="nt">&lt;/Appenders&gt;</span>
 
   <span class="nt">&lt;Loggers&gt;</span>
-    <span class="nt">&lt;Root</span> <span class="na">name=</span><span 
class="s">&quot;STARTUP_LOGGER&quot;</span> <span class="na">level=</span><span 
class="s">&quot;info&quot;</span> <span class="na">additivity=</span><span 
class="s">&quot;false&quot;</span><span class="nt">&gt;</span>
-      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">&quot;StartupAppender&quot;</span><span 
class="nt">/&gt;</span>
+    <span class="nt">&lt;Root</span> <span class="na">name=</span><span 
class="s">"STARTUP_LOGGER"</span> <span class="na">level=</span><span 
class="s">"info"</span> <span class="na">additivity=</span><span 
class="s">"false"</span><span class="nt">&gt;</span>
+      <span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">"StartupAppender"</span><span 
class="nt">/&gt;</span>
     <span class="nt">&lt;/Root&gt;</span>
   <span class="nt">&lt;/Loggers&gt;</span></code></pre></figure>
 
 <h4 id="changing-log-levels">Changing log levels</h4>
 
-<h5 id="log4j">Log4j:</h5>
+<h5 id="log4j-1">Log4j:</h5>
 
-<p>Sometimes it&rsquo;s desirable to change the Log4J log level from 
<code>INFO</code> to <code>DEBUG</code> at runtime so that a developer can 
enable more logging for a Samza container that&rsquo;s exhibiting undesirable 
behavior. Samza provides a Log4j class called JmxAppender, which will allow you 
to dynamically modify log levels at runtime. The JmxAppender class is located 
in the samza-log4j package, and can be turned on by first adding a runtime 
dependency to the samza-log4j package:</p>
+<p>Sometimes itâs desirable to change the Log4J log level from <code 
class="language-plaintext highlighter-rouge">INFO</code> to <code 
class="language-plaintext highlighter-rouge">DEBUG</code> at runtime so that a 
developer can enable more logging for a Samza container thatâs exhibiting 
undesirable behavior. Samza provides a Log4j class called JmxAppender, which 
will allow you to dynamically modify log levels at runtime. The JmxAppender 
class is located in the samza-log4j package, and can be turned on by first 
adding a runtime dependency to the samza-log4j package:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;dependency&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;dependency&gt;</span>
   <span class="nt">&lt;setId&gt;</span>org.apache.samza<span 
class="nt">&lt;/setId&gt;</span>
   <span class="nt">&lt;artifactId&gt;</span>samza-log4j<span 
class="nt">&lt;/artifactId&gt;</span>
   <span class="nt">&lt;scope&gt;</span>runtime<span 
class="nt">&lt;/scope&gt;</span>
@@ -869,84 +884,83 @@ Also, make sure there isn&rsquo;t any de
 
 <p>And then updating your log4j.xml to include the appender:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">&quot;jmx&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.samza.logging.log4j.JmxAppender&quot;</span> <span 
class="nt">/&gt;</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">"jmx"</span> <span 
class="na">class=</span><span 
class="s">"org.apache.samza.logging.log4j.JmxAppender"</span> <span 
class="nt">/&gt;</span></code></pre></figure>
 
-<h5 id="log4j2">Log4j2:</h5>
+<h5 id="log4j2-1">Log4j2:</h5>
 
 <p>Log4j2 provides built-in support for JMX where all LoggerContexts, 
LoggerConfigs and Appenders are instrumented with MBeans and can be remotely 
monitored and controlled. This eliminates the need for a dedicated JMX 
appender. The steps to analyze and change the logger/appender properties at 
runtime are documented <a 
href="https://logging.apache.org/log4j/2.0/manual/jmx.html";>here</a>.</p>
 
-<p>NOTE: If you use JMXAppender and are migrating from log4j to log4j2, simply 
remove it from your xml file. Donât add it to your log4j2.xml file as it 
doesnât exist in the samza-log4j2 module.  </p>
+<p>NOTE: If you use JMXAppender and are migrating from log4j to log4j2, simply 
remove it from your xml file. Donât add it to your log4j2.xml file as it 
doesnât exist in the samza-log4j2 module.</p>
 
 <h4 id="stream-appender">Stream Appender</h4>
 
-<p>Samza provides a StreamAppender to publish the logs into a specific system. 
You can specify the system name using &ldquo;task.log4j.system&rdquo; and 
change name of log stream with param &lsquo;StreamName&rsquo;. The <a 
href="http://logback.qos.ch/manual/mdc.html";>MDC</a> contains the keys 
&ldquo;containerName&rdquo;, &ldquo;jobName&rdquo; and &ldquo;jobId&rdquo;, 
which help identify the source of the log. In order to use this appender, 
define the system name by specifying the config as follows:</p>
+<p>Samza provides a StreamAppender to publish the logs into a specific system. 
You can specify the system name using âtask.log4j.systemâ and change name 
of log stream with param âStreamNameâ. The <a 
href="http://logback.qos.ch/manual/mdc.html";>MDC</a> contains the keys 
âcontainerNameâ, âjobNameâ and âjobIdâ, which help identify the 
source of the log. In order to use this appender, define the system name by 
specifying the config as follows:</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span>task.log4j.system=&quot;<span 
class="nt">&lt;system-name&gt;</span>&quot;</code></pre></figure>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml">task.log4j.system="<span 
class="nt">&lt;system-name&gt;</span>"</code></pre></figure>
 
 <p>Also, the following needs to be added to the respective 
log4j.xml/log4j2.xml files:</p>
 
-<h5 id="log4j">Log4j:</h5>
+<h5 id="log4j-2">Log4j:</h5>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">&quot;StreamAppender&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.samza.logging.log4j.StreamAppender&quot;</span><span 
class="nt">&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;appender</span> <span 
class="na">name=</span><span class="s">"StreamAppender"</span> <span 
class="na">class=</span><span 
class="s">"org.apache.samza.logging.log4j.StreamAppender"</span><span 
class="nt">&gt;</span>
    <span class="c">&lt;!-- optional --&gt;</span>
-   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;StreamName&quot;</span> <span class="na">value=</span><span 
class="s">&quot;EpicStreamName&quot;</span><span class="nt">/&gt;</span>
-   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">&quot;org.apache.log4j.PatternLayout&quot;</span><span 
class="nt">&gt;</span>
-     <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;ConversionPattern&quot;</span> <span 
class="na">value=</span><span class="s">&quot;%X{containerName} %X{jobName} 
%X{jobId} %d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} [%p] %m%n&quot;</span> <span 
class="nt">/&gt;</span>
+   <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"StreamName"</span> <span class="na">value=</span><span 
class="s">"EpicStreamName"</span><span class="nt">/&gt;</span>
+   <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">"org.apache.log4j.PatternLayout"</span><span class="nt">&gt;</span>
+     <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">"ConversionPattern"</span> <span class="na">value=</span><span 
class="s">"%X{containerName} %X{jobName} %X{jobId} %d{yyyy-MM-dd HH:mm:ss.SSS} 
[%t] %c{1} [%p] %m%n"</span> <span class="nt">/&gt;</span>
    <span class="nt">&lt;/layout&gt;</span>
 <span class="nt">&lt;/appender&gt;</span></code></pre></figure>
 
 <p>and</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;appender-ref</span> <span 
class="na">ref=</span><span class="s">&quot;StreamAppender&quot;</span><span 
class="nt">/&gt;</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;appender-ref</span> <span 
class="na">ref=</span><span class="s">"StreamAppender"</span><span 
class="nt">/&gt;</span></code></pre></figure>
 
-<h5 id="log4j2">Log4j2:</h5>
+<h5 id="log4j2-2">Log4j2:</h5>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;Stream</span> <span 
class="na">name=</span><span class="s">&quot;StreamAppender&quot;</span> <span 
class="na">streamName=</span><span 
class="s">&quot;TestStreamName&quot;</span><span class="nt">&gt;</span>
-  <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">&quot;%d{yyyy-MM-dd HH:mm:ss.SSS} 
[%t] %c{1} [%p] %m%n&quot;</span><span class="nt">/&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;Stream</span> <span 
class="na">name=</span><span class="s">"StreamAppender"</span> <span 
class="na">streamName=</span><span class="s">"TestStreamName"</span><span 
class="nt">&gt;</span>
+  <span class="nt">&lt;PatternLayout</span> <span 
class="na">pattern=</span><span class="s">"%d{yyyy-MM-dd HH:mm:ss.SSS} [%t] 
%c{1} [%p] %m%n"</span><span class="nt">/&gt;</span>
 <span class="nt">&lt;/Stream&gt;</span></code></pre></figure>
 
 <p>and</p>
 
-<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span></span><span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">&quot;StreamAppender&quot;</span><span 
class="nt">/&gt;</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-xml" 
data-lang="xml"><span class="nt">&lt;AppenderRef</span> <span 
class="na">ref=</span><span class="s">"StreamAppender"</span><span 
class="nt">/&gt;</span></code></pre></figure>
 
-<p>The default stream name for logger is generated using the following 
convention,
- <code>java
- &quot;__samza_%s_%s_logs&quot; format (jobName.replaceAll(&quot;_&quot;, 
&quot;-&quot;), jobId.replaceAll(&quot;_&quot;, &quot;-&quot;))
-</code>
-though you can override it using the <code>StreamName</code> property in the 
xml files as shown above.</p>
+<p>The default stream name for logger is generated using the following 
convention,</p>
+<div class="language-java highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code> <span class="s">"__samza_%s_%s_logs"</span> <span 
class="n">format</span> <span class="o">(</span><span 
class="n">jobName</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span 
class="o">),</span> <span class="n">jobId</span><span class="o">.</span><span 
class="na">replaceAll</span><span class="o">(</span><span 
class="s">"_"</span><span class="o">,</span> <span class="s">"-"</span><span 
class="o">))</span>
+</code></pre></div></div>
+<p>though you can override it using the <code class="language-plaintext 
highlighter-rouge">StreamName</code> property in the xml files as shown 
above.</p>
 
-<p>Configuring the StreamAppender will automatically encode messages using 
logstash&rsquo;s <a 
href="https://github.com/logstash/log4j-jsonevent-layout";>Log4J JSON 
format</a>. Samza also supports pluggable serialization for those that prefer 
non-JSON logging events. This can be configured the same way other stream 
serializers are defined:</p>
+<p>Configuring the StreamAppender will automatically encode messages using 
logstashâs <a href="https://github.com/logstash/log4j-jsonevent-layout";>Log4J 
JSON format</a>. Samza also supports pluggable serialization for those that 
prefer non-JSON logging events. This can be configured the same way other 
stream serializers are defined:</p>
 
-<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span></span><span 
class="na">serializers.registry.log4j-string.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.logging.log4j.serializers.LoggingEventStringSerdeFactory</span>
-<span 
class="na">systems.mock.streams.__samza_jobname_jobid_logs.samza.msg.serde</span><span
 class="o">=</span><span class="s">log4j-string</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties">serializers.registry.log4j-string.class=org.apache.samza.logging.log4j.serializers.LoggingEventStringSerdeFactory
+systems.mock.streams.__samza_jobname_jobid_logs.samza.msg.serde=log4j-string</code></pre></figure>
 
-<p>The StreamAppender will always send messages to a job&rsquo;s log stream 
keyed by the container name.</p>
+<p>The StreamAppender will always send messages to a jobâs log stream keyed 
by the container name.</p>
 
 <h3 id="log-directory">Log Directory</h3>
 
-<p>Samza will look for the <code>SAMZA_LOG_DIR</code> environment variable 
when it executes. If this variable is defined, all logs will be written to this 
directory. If the environment variable is empty, or not defined, then Samza 
will use <code>$base_dir</code>, which is the directory one level up from 
Samza&rsquo;s <a href="packaging.html">run-class.sh</a> script. This 
environment variable can also be referenced inside log4j.xml files (see 
above).</p>
+<p>Samza will look for the <code class="language-plaintext 
highlighter-rouge">SAMZA_LOG_DIR</code> environment variable when it executes. 
If this variable is defined, all logs will be written to this directory. If the 
environment variable is empty, or not defined, then Samza will use <code 
class="language-plaintext highlighter-rouge">$base_dir</code>, which is the 
directory one level up from Samzaâs <a href="packaging.html">run-class.sh</a> 
script. This environment variable can also be referenced inside log4j.xml files 
(see above).</p>
 
 <h3 id="garbage-collection-logging">Garbage Collection Logging</h3>
 
-<p>Samza will automatically set the following garbage collection logging 
setting, and will output it to <code>$SAMZA_LOG_DIR/gc.log</code>.</p>
+<p>Samza will automatically set the following garbage collection logging 
setting, and will output it to <code class="language-plaintext 
highlighter-rouge">$SAMZA_LOG_DIR/gc.log</code>.</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-XX:+PrintGCDateStamps -Xloggc:<span 
class="nv">$SAMZA_LOG_DIR</span>/gc.log</code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-XX</span>:+PrintGCDateStamps <span 
class="nt">-Xloggc</span>:<span 
class="nv">$SAMZA_LOG_DIR</span>/gc.log</code></pre></figure>
 
 <h4 id="rotation">Rotation</h4>
 
 <p>In older versions of Java, it is impossible to have GC logs roll over based 
on time or size without the use of a secondary tool. This means that your GC 
logs will never be deleted until a Samza job ceases to run. As of <a 
href="http://www.oracle.com/technetwork/java/javase/2col/6u34-bugfixes-1733379.html";>Java
 6 Update 34</a>, and <a 
href="http://www.oracle.com/technetwork/java/javase/7u2-relnotes-1394228.html";>Java
 7 Update 2</a>, <a 
href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6941923";>new GC 
command line switches</a> have been added to support this functionality. If GC 
log file rotation is supported by the JVM, Samza will also set:</p>
 
-<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span></span>-XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles<span class="o">=</span><span class="m">10</span> 
-XX:GCLogFileSize<span class="o">=</span><span 
class="m">10241024</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-bash" 
data-lang="bash"><span class="nt">-XX</span>:+UseGCLogFileRotation <span 
class="nt">-XX</span>:NumberOfGCLogFiles<span class="o">=</span>10 <span 
class="nt">-XX</span>:GCLogFileSize<span 
class="o">=</span>10241024</code></pre></figure>
 
 <h3 id="yarn">YARN</h3>
 
-<p>When a Samza job executes on a YARN grid, the <code>$SAMZA_LOG_DIR</code> 
environment variable will point to a directory that is secured such that only 
the user executing the Samza job can read and write to it, if YARN is <a 
href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html";>securely
 configured</a>.</p>
+<p>When a Samza job executes on a YARN grid, the <code 
class="language-plaintext highlighter-rouge">$SAMZA_LOG_DIR</code> environment 
variable will point to a directory that is secured such that only the user 
executing the Samza job can read and write to it, if YARN is <a 
href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html";>securely
 configured</a>.</p>
 
 <h4 id="stdout">STDOUT</h4>
 
-<p>Samza&rsquo;s <a 
href="../yarn/application-master.html">ApplicationMaster</a> pipes all STDOUT 
and STDERR output to logs/stdout and logs/stderr, respectively. These files are 
never rotated.</p>
+<p>Samzaâs <a href="../yarn/application-master.html">ApplicationMaster</a> 
pipes all STDOUT and STDERR output to logs/stdout and logs/stderr, 
respectively. These files are never rotated.</p>
 
-<h2 id="reprocessing"><a href="reprocessing.html">Reprocessing &raquo;</a></h2>
+<h2 id="reprocessing-"><a href="reprocessing.html">Reprocessing Â»</a></h2>
 
            
         </div>


Modified: samza/site/learn/documentation/latest/jobs/packaging.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/jobs/packaging.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/jobs/packaging.html (original)
+++ samza/site/learn/documentation/latest/jobs/packaging.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/jobs/packaging">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/jobs/packaging">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/jobs/packaging">1.6.0</a></li>
 
               
@@ -639,26 +653,30 @@
    limitations under the License.
 -->
 
-<p>The <a href="job-runner.html">JobRunner</a> page talks about run-job.sh, 
and how it&rsquo;s used to start a job either locally 
(ProcessJobFactory/ThreadJobFactory) or with YARN (YarnJobFactory). In the 
diagram that shows the execution flow, it also shows a run-container.sh script. 
This script, along with a run-am.sh script, are what Samza actually calls to 
execute its code.</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>bin/run-am.sh
+<p>The <a href="job-runner.html">JobRunner</a> page talks about run-job.sh, 
and how itâs used to start a job either locally 
(ProcessJobFactory/ThreadJobFactory) or with YARN (YarnJobFactory). In the 
diagram that shows the execution flow, it also shows a run-container.sh script. 
This script, along with a run-am.sh script, are what Samza actually calls to 
execute its code.</p>
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>bin/run-am.sh
 bin/run-container.sh
-</code></pre></div>
-<p>The run-container.sh script is responsible for starting the <a 
href="../container/samza-container.html">SamzaContainer</a>. The run-am.sh 
script is responsible for starting Samza&rsquo;s application master for YARN. 
Thus, the run-am.sh script is only used by the YarnJob, but both YarnJob and 
ProcessJob use run-container.sh.</p>
+</code></pre></div></div>
+
+<p>The run-container.sh script is responsible for starting the <a 
href="../container/samza-container.html">SamzaContainer</a>. The run-am.sh 
script is responsible for starting Samzaâs application master for YARN. Thus, 
the run-am.sh script is only used by the YarnJob, but both YarnJob and 
ProcessJob use run-container.sh.</p>
 
 <p>Typically, these two scripts are bundled into a tar.gz file that has a 
structure like this:</p>
-<div class="highlight"><pre><code class="language-text" 
data-lang="text"><span></span>bin/run-am.sh
+
+<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code>bin/run-am.sh
 bin/run-class.sh
 bin/run-job.sh
 bin/run-container.sh
 lib/*.jar
-</code></pre></div>
-<p>To run a Samza job, you un-zip its tar.gz file, and execute the run-job.sh 
script, as defined in the JobRunner section. There are a number of interesting 
implications from this packaging scheme. First, you&rsquo;ll notice that there 
is no configuration in the package. Second, you&rsquo;ll notice that the lib 
directory contains all JARs that you&rsquo;ll need to run your Samza job.</p>
+</code></pre></div></div>
+
+<p>To run a Samza job, you un-zip its tar.gz file, and execute the run-job.sh 
script, as defined in the JobRunner section. There are a number of interesting 
implications from this packaging scheme. First, youâll notice that there is 
no configuration in the package. Second, youâll notice that the lib directory 
contains all JARs that youâll need to run your Samza job.</p>
 
-<p>The reason that configuration is decoupled from your Samza job packaging is 
that it allows configuration to be updated without having to re-build the 
entire Samza package. This makes life easier for everyone when you just need to 
tweak one parameter, and don&rsquo;t want to have to worry about which branch 
your package was built from, or whether trunk is in a stable state. It also has 
the added benefit of forcing configuration to be fully resolved at runtime. 
This means that that the configuration for a job is resolved at the time 
run-job.sh is called (using &ndash;config-path and &ndash;config-provider 
parameters), and from that point on, the configuration is immutable, and passed 
where it needs to be by Samza (and YARN, if you&rsquo;re using it).</p>
+<p>The reason that configuration is decoupled from your Samza job packaging is 
that it allows configuration to be updated without having to re-build the 
entire Samza package. This makes life easier for everyone when you just need to 
tweak one parameter, and donât want to have to worry about which branch your 
package was built from, or whether trunk is in a stable state. It also has the 
added benefit of forcing configuration to be fully resolved at runtime. This 
means that that the configuration for a job is resolved at the time run-job.sh 
is called (using âconfig-path and âconfig-provider parameters), and from 
that point on, the configuration is immutable, and passed where it needs to be 
by Samza (and YARN, if youâre using it).</p>
 
 <p>The second statement, that your Samza package contains all JARs that it 
needs to run, means that a Samza package is entirely self contained. This 
allows Samza jobs to run on independent Samza versions without conflicting with 
each other. This is in contrast to Hadoop, where JARs are pulled in from the 
local machine that the job is running on (using environment variables). With 
Samza, you might run your job on version 0.7.0, and someone else might run 
their job on version 0.8.0. There is no problem with this.</p>
 
-<h2 id="yarn-jobs"><a href="yarn-jobs.html">YARN Jobs &raquo;</a></h2>
+<h2 id="yarn-jobs-"><a href="yarn-jobs.html">YARN Jobs Â»</a></h2>
 
            
         </div>

Modified: samza/site/learn/documentation/latest/jobs/reprocessing.html
URL: 
http://svn.apache.org/viewvc/samza/site/learn/documentation/latest/jobs/reprocessing.html?rev=1906774&r1=1906773&r2=1906774&view=diff
==============================================================================
--- samza/site/learn/documentation/latest/jobs/reprocessing.html (original)
+++ samza/site/learn/documentation/latest/jobs/reprocessing.html Wed Jan 18 
19:33:25 2023
@@ -227,6 +227,12 @@
     
       
         
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.8.0">1.8.0</a>
+      
+        
+      <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.7.0">1.7.0</a>
+      
+        
       <a class="side-navigation__group-item" data-match-active="" 
href="/releases/1.6.0">1.6.0</a>
       
         
@@ -538,6 +544,14 @@
               
               
 
+              <li class="hide"><a 
href="/learn/documentation/1.8.0/jobs/reprocessing">1.8.0</a></li>
+
+              
+
+              <li class="hide"><a 
href="/learn/documentation/1.7.0/jobs/reprocessing">1.7.0</a></li>
+
+              
+
               <li class="hide"><a 
href="/learn/documentation/1.6.0/jobs/reprocessing">1.6.0</a></li>
 
               
@@ -644,53 +658,56 @@
 <p>When you start up a new version of your job, a question arises: what do you 
want to do with messages that were previously processed with the old version of 
your job? The answer depends on the behavior you want:</p>
 
 <ol>
-<li><p><strong>No reprocessing:</strong> By default, Samza assumes that 
messages processed by the old version don&rsquo;t need to be processed again. 
When the new version starts up, it will resume processing at the point where 
the old version left off (assuming you have <a 
href="../container/checkpointing.html">checkpointing</a> enabled). If this is 
the behavior you want, there&rsquo;s nothing special you need to do.</p></li>
-<li><p><strong>Simple rewind:</strong> Perhaps you want to go back and 
re-process old messages using the new version of your job. For example, maybe 
the old version of your classifier marked things as spam too aggressively, so 
you now want to revisit its previous spam/not-spam decisions using an improved 
classifier. You can do this by restarting the job at an older point in time in 
the stream, and running through all the messages since that time. Thus your job 
starts off reprocessing messages that it has already seen, but it then 
seamlessly continues with new messages when the reprocessing is done.</p></li>
-</ol>
+  <li>
+    <p><strong>No reprocessing:</strong> By default, Samza assumes that 
messages processed by the old version donât need to be processed again. When 
the new version starts up, it will resume processing at the point where the old 
version left off (assuming you have <a 
href="../container/checkpointing.html">checkpointing</a> enabled). If this is 
the behavior you want, thereâs nothing special you need to do.</p>
+  </li>
+  <li>
+    <p><strong>Simple rewind:</strong> Perhaps you want to go back and 
re-process old messages using the new version of your job. For example, maybe 
the old version of your classifier marked things as spam too aggressively, so 
you now want to revisit its previous spam/not-spam decisions using an improved 
classifier. You can do this by restarting the job at an older point in time in 
the stream, and running through all the messages since that time. Thus your job 
starts off reprocessing messages that it has already seen, but it then 
seamlessly continues with new messages when the reprocessing is done.</p>
+
+    <p>This approach requires an input system such as Kafka, which allows you 
to jump back in time to a previous point in the stream. We discuss below how 
this works in practice.</p>
+  </li>
+  <li>
+    <p><strong>Parallel rewind:</strong> This approach avoids a downside of 
the <em>simple rewind</em> approach. With simple rewind, any new messages that 
appear while the job is reprocessing old data are queued up, and are processed 
when the reprocessing is done. The queueing delay neednât be long, because 
Samza can stream through historical data very quickly, but some 
latency-sensitive applications need to process messages faster.</p>
 
-<p>This approach requires an input system such as Kafka, which allows you to 
jump back in time to a previous point in the stream. We discuss below how this 
works in practice.</p>
+    <p>In the <em>parallel rewind</em> approach, you run two jobs in parallel: 
one job continues to handle live updates with low latency (the <em>real-time 
job</em>), while the other is started at an older point in the stream and 
reprocesses historical data (the <em>reprocessing job</em>). The two jobs 
consume the same input stream at different points in time, and eventually the 
reprocessing job catches up with the real-time job.</p>
 
-<ol>
-<li><strong>Parallel rewind:</strong> This approach avoids a downside of the 
<em>simple rewind</em> approach. With simple rewind, any new messages that 
appear while the job is reprocessing old data are queued up, and are processed 
when the reprocessing is done. The queueing delay needn&rsquo;t be long, 
because Samza can stream through historical data very quickly, but some 
latency-sensitive applications need to process messages faster.</li>
+    <p>There are a few details that you need to think through before deploying 
parallel rewind, which we discuss below.</p>
+  </li>
 </ol>
 
-<p>In the <em>parallel rewind</em> approach, you run two jobs in parallel: one 
job continues to handle live updates with low latency (the <em>real-time 
job</em>), while the other is started at an older point in the stream and 
reprocesses historical data (the <em>reprocessing job</em>). The two jobs 
consume the same input stream at different points in time, and eventually the 
reprocessing job catches up with the real-time job.</p>
-
-<p>There are a few details that you need to think through before deploying 
parallel rewind, which we discuss below.</p>
-
 <h3 id="jumping-back-in-time">Jumping Back in Time</h3>
 
-<p>A common aspect of the <em>simple rewind</em> and <em>parallel rewind</em> 
approaches is: you have a job which jumps back to an old point in time in the 
input streams, and consumes all messages since that time. You achieve this by 
working with Samza&rsquo;s checkpoints.</p>
+<p>A common aspect of the <em>simple rewind</em> and <em>parallel rewind</em> 
approaches is: you have a job which jumps back to an old point in time in the 
input streams, and consumes all messages since that time. You achieve this by 
working with Samzaâs checkpoints.</p>
 
 <p>Normally, when a Samza job starts up, it reads the latest checkpoint to 
determine at which offset in the input streams it needs to resume processing. 
If you need to rewind to an earlier time, you do that in one of two ways:</p>
 
 <ol>
-<li>You can stop the job, manipulate its last checkpoint to point to an older 
offset, and start the job up again. Samza includes a command-line tool called 
<a href="../container/checkpointing.html#toc_0">CheckpointTool</a> which you 
can use to manipulate checkpoints.</li>
-<li>You can start a new job with a different <em>job.name</em> or 
<em>job.id</em> (e.g. increment <em>job.id</em> every time you need to jump 
back in time). This gives the job a new checkpoint stream, with none of the old 
checkpoint information. You also need to set <a 
href="../container/checkpointing.html">samza.offset.default=oldest</a>, so that 
when the job starts up without checkpoint, it starts consuming at the oldest 
offset available.</li>
+  <li>You can stop the job, manipulate its last checkpoint to point to an 
older offset, and start the job up again. Samza includes a command-line tool 
called <a href="../container/checkpointing.html#toc_0">CheckpointTool</a> which 
you can use to manipulate checkpoints.</li>
+  <li>You can start a new job with a different <em>job.name</em> or 
<em>job.id</em> (e.g. increment <em>job.id</em> every time you need to jump 
back in time). This gives the job a new checkpoint stream, with none of the old 
checkpoint information. You also need to set <a 
href="../container/checkpointing.html">samza.offset.default=oldest</a>, so that 
when the job starts up without checkpoint, it starts consuming at the oldest 
offset available.</li>
 </ol>
 
-<p>With either of these approaches you can get Samza to reprocess the entire 
history of messages in the input system. Input systems such as Kafka can retain 
a large amount of history &mdash; see discussion below. In order to speed up 
the reprocessing of historical data, you can increase the container count 
(<em>job.container.count</em> if you&rsquo;re running Samza on YARN) to boost 
your job&rsquo;s computational resources.</p>
+<p>With either of these approaches you can get Samza to reprocess the entire 
history of messages in the input system. Input systems such as Kafka can retain 
a large amount of history â see discussion below. In order to speed up the 
reprocessing of historical data, you can increase the container count 
(<em>job.container.count</em> if youâre running Samza on YARN) to boost your 
jobâs computational resources.</p>
 
 <p>If your job maintains any <a 
href="../container/state-management.html">persistent state</a>, you need to be 
careful when jumping back in time: resetting a checkpoint does not 
automatically change persistent state, so you could end up reprocessing old 
messages while using state from a later point in time. In most cases, a job 
that jumps back in time should start with an empty state. You can reset the 
state by deleting the changelog topic, or by changing the name of the changelog 
topic in your job configuration.</p>
 
-<p>When you&rsquo;re jumping back in time, you&rsquo;re using Samza somewhat 
like a batch processing framework (e.g. MapReduce) &mdash; with the difference 
that your job doesn&rsquo;t stop when it has processed all the historical data, 
but instead continues running, incrementally processing the stream of new 
messages as they come in. This has the advantage that you don&rsquo;t need to 
write and maintain separate batch and streaming versions of your job: you can 
just use the same Samza API for processing both real-time and historical 
data.</p>
+<p>When youâre jumping back in time, youâre using Samza somewhat like a 
batch processing framework (e.g. MapReduce) â with the difference that your 
job doesnât stop when it has processed all the historical data, but instead 
continues running, incrementally processing the stream of new messages as they 
come in. This has the advantage that you donât need to write and maintain 
separate batch and streaming versions of your job: you can just use the same 
Samza API for processing both real-time and historical data.</p>
 
 <h3 id="retention-of-history">Retention of history</h3>
 
-<p>Samza doesn&rsquo;t maintain history itself &mdash; that is the 
responsibility of the input system, such as Kafka. How far back in time you can 
jump depends on the amount of history that is retained in that system.</p>
+<p>Samza doesnât maintain history itself â that is the responsibility of 
the input system, such as Kafka. How far back in time you can jump depends on 
the amount of history that is retained in that system.</p>
 
-<p>Kafka is designed to keep a fairly large amount of history: it is common 
for Kafka brokers to keep one or two weeks of message history accessible, even 
for high volume topics. The retention period is mostly determined by how much 
disk space you have available. Kafka&rsquo;s performance <a 
href="http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines";>remains
 high</a> even if you have terabytes of history.</p>
+<p>Kafka is designed to keep a fairly large amount of history: it is common 
for Kafka brokers to keep one or two weeks of message history accessible, even 
for high volume topics. The retention period is mostly determined by how much 
disk space you have available. Kafkaâs performance <a 
href="http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines";>remains
 high</a> even if you have terabytes of history.</p>
 
 <p>There are two different kinds of history which require different 
configuration:</p>
 
 <ul>
-<li><strong>Activity events</strong> are things like user tracking events, web 
server log events and the like. This kind of stream is typically configured 
with a time-based retention, e.g. a few weeks. Events older than the retention 
period are deleted (or archived in an offline system such as HDFS).</li>
-<li><strong>Database changes</strong> are events that show inserts, updates 
and deletes in a database. In this kind of stream, each event typically has a 
primary key, and a newer event for a key overwrites any older events for the 
same key. If the same key is updated many times, you&rsquo;re only really 
interested in the most recent value. (The <a 
href="../container/state-management.html">changelog streams</a> used by 
Samza&rsquo;s persistent state fall in this category.)</li>
+  <li><strong>Activity events</strong> are things like user tracking events, 
web server log events and the like. This kind of stream is typically configured 
with a time-based retention, e.g. a few weeks. Events older than the retention 
period are deleted (or archived in an offline system such as HDFS).</li>
+  <li><strong>Database changes</strong> are events that show inserts, updates 
and deletes in a database. In this kind of stream, each event typically has a 
primary key, and a newer event for a key overwrites any older events for the 
same key. If the same key is updated many times, youâre only really 
interested in the most recent value. (The <a 
href="../container/state-management.html">changelog streams</a> used by 
Samzaâs persistent state fall in this category.)</li>
 </ul>
 
-<p>In a database change stream, when you&rsquo;re reprocessing data, you 
typically want to reprocess the entire database. You don&rsquo;t want to miss a 
value just because it was last updated more than a few weeks ago. In other 
words, you don&rsquo;t want change events to be deleted just because they are 
older than some threshold. In this case, when you&rsquo;re jumping back in 
time, you need to rewind to the <em>beginning of time</em>, to the first change 
ever made to the database (known in Kafka as &ldquo;offset 0&rdquo;).</p>
+<p>In a database change stream, when youâre reprocessing data, you typically 
want to reprocess the entire database. You donât want to miss a value just 
because it was last updated more than a few weeks ago. In other words, you 
donât want change events to be deleted just because they are older than some 
threshold. In this case, when youâre jumping back in time, you need to rewind 
to the <em>beginning of time</em>, to the first change ever made to the 
database (known in Kafka as âoffset 0â).</p>
 
-<p>Fortunately this can be done efficiently, using a Kafka feature called <a 
href="http://kafka.apache.org/documentation.html#compaction";>log 
compaction</a>. </p>
+<p>Fortunately this can be done efficiently, using a Kafka feature called <a 
href="http://kafka.apache.org/documentation.html#compaction";>log 
compaction</a>.</p>
 
 <p>For example, imagine your database contains counters: every time something 
happens, you increment the appropriate counters and update the database with 
the new counter values. Every update is sent to the changelog, and because 
there are many updates, the changelog stream will take up a lot of space. With 
log compaction turned on, Kafka deduplicates the stream in the background, 
keeping only the most recent counter value for each key, and deleting any old 
values for the same counter. This reduces the size of the stream so much that 
you can keep the most recent update for every key, even if it was last updated 
long ago.</p>
 
@@ -701,15 +718,15 @@
 <p>If you are taking the <em>parallel rewind</em> approach described above, 
running two jobs in parallel, you need to configure them carefully to avoid 
problems. In particular, some things to look out for:</p>
 
 <ul>
-<li>Make sure that the two jobs don&rsquo;t interfere with each other. They 
need different <em>job.name</em> or <em>job.id</em> configuration properties, 
so that each job gets its own checkpoint stream. If the jobs maintain <a 
href="../container/state-management.html">persistent state</a>, each job needs 
its own changelog (two different jobs writing to the same changelog produces 
undefined results).</li>
-<li>What happens to job output? If the job sends its results to an output 
stream, or writes to a database, then the easiest solution is for each job to 
have a separate output stream or database table. If they write to the same 
output, you need to take care to ensure that newer data isn&rsquo;t overwritten 
with older data (due to race conditions between the two jobs).</li>
-<li>Do you need to support A/B testing between the old and the new version of 
your job, e.g. to test whether the new version improves your metrics? Parallel 
rewind is ideal for this: each job writes to a separate output, and clients or 
consumers of the output can read from either the old or the new version&rsquo;s 
output, depending on whether a user is in test group A or B.</li>
-<li>Reclaiming resources: you might want to keep the old version of your job 
running for a while, even when the new version has finished reprocessing 
historical data (especially if the old version&rsquo;s output is being used in 
an A/B test). However, eventually you&rsquo;ll want to shut it down, and delete 
the checkpoint and changelog streams belonging to the old version.</li>
+  <li>Make sure that the two jobs donât interfere with each other. They need 
different <em>job.name</em> or <em>job.id</em> configuration properties, so 
that each job gets its own checkpoint stream. If the jobs maintain <a 
href="../container/state-management.html">persistent state</a>, each job needs 
its own changelog (two different jobs writing to the same changelog produces 
undefined results).</li>
+  <li>What happens to job output? If the job sends its results to an output 
stream, or writes to a database, then the easiest solution is for each job to 
have a separate output stream or database table. If they write to the same 
output, you need to take care to ensure that newer data isnât overwritten 
with older data (due to race conditions between the two jobs).</li>
+  <li>Do you need to support A/B testing between the old and the new version 
of your job, e.g. to test whether the new version improves your metrics? 
Parallel rewind is ideal for this: each job writes to a separate output, and 
clients or consumers of the output can read from either the old or the new 
versionâs output, depending on whether a user is in test group A or B.</li>
+  <li>Reclaiming resources: you might want to keep the old version of your job 
running for a while, even when the new version has finished reprocessing 
historical data (especially if the old versionâs output is being used in an 
A/B test). However, eventually youâll want to shut it down, and delete the 
checkpoint and changelog streams belonging to the old version.</li>
 </ul>
 
-<p>Samza gives you a lot of flexibility for reprocessing historical data, and 
you don&rsquo;t need to program against a separate batch processing API to take 
advantage of it. If you&rsquo;re mindful of these issues, you can build a data 
system that is very robust, but still gives you lots of freedom to change your 
processing logic in future.</p>
+<p>Samza gives you a lot of flexibility for reprocessing historical data, and 
you donât need to program against a separate batch processing API to take 
advantage of it. If youâre mindful of these issues, you can build a data 
system that is very robust, but still gives you lots of freedom to change your 
processing logic in future.</p>
 
-<h2 id="web-ui-and-rest-api"><a href="web-ui-rest-api.html">Web UI and REST 
API &raquo;</a></h2>
+<h2 id="web-ui-and-rest-api-"><a href="web-ui-rest-api.html">Web UI and REST 
API Â»</a></h2>
 
            
         </div>

svn commit: r1906774 [35/49] - in /samza/site: ./ archive/ blog/ case-studies/ community/ contribute/ img/latest/learn/documentation/api/ learn/documentation/latest/ learn/documentation/latest/api/ learn/documentation/latest/api/javadocs/ learn/documen...

Reply via email to