samz...

criccomini Mon, 02 Sep 2013 09:36:05 -0700

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/task-runner.html 
Mon Sep  2 16:35:10 2013
@@ -65,6 +65,8 @@
         <div class="body">
           <h2>TaskRunner</h2>
 
+<!-- TODO: Is TaskRunner still appropriate terminology to use (appears to be a 
combo of SamzaContainer and TaskInstance in the code)? -->
+
 <p>The TaskRunner is Samza&#39;s stream processing container. It is 
responsible for managing the startup, execution, and shutdown of one or more 
StreamTask instances.</p>
 
 <p>When the a TaskRunner starts up, it does the following:</p>
@@ -85,8 +87,8 @@
 <h3>Tasks and Partitions</h3>
 
 <p>When the TaskRunner starts, it creates an instance of the StreamTask that 
you&#39;ve written. If the StreamTask implements the InitableTask interface, 
the TaskRunner will also call the init() method.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">public interface InitableTask {
-  void init(Config config, TaskContextPartition context);
+<div class="highlight"><pre><code class="text language-text">public interface 
InitableTask {
+  void init(Config config, TaskContext context);
 }
 </code></pre></div>
 <p>It doesn&#39;t just do this once, though. It creates the StreamTask once 
for each partition in your Samza job. If your Samza job has ten partitions, 
there will be ten instantiations of your StreamTask: one for each partition. 
The StreamTask instance for partition one will receive all messages for 
partition one, the instance for partition two will receive all messages for 
partition two, and so on.</p>
@@ -97,7 +99,7 @@
 
 <p>If a Samza job has more than one input stream, then the number of 
partitions for the Samza job will be the maximum number of partitions across 
all input streams. For example, if a Samza job is reading from PageView event, 
which has 12 partitions, and ServiceMetricEvent, which has 14 partitions, then 
the Samza job would have 14 partitions (0 through 13).</p>
 
-<p>When the TaskRunner&#39;s StreamConsumer threads are reading messages from 
each input stream partition, the messages that it receives are tagged with the 
partition number that it came from. Each message is fed to the StreamTask 
instance that corresponds to the message&#39;s partition. This design has two 
important properties. When a Samza job has more than one input stream, and 
those streams have an imbalanced number of partitions (e.g. one has 12 
partitions and the other has 14), then some of your StreamTask instances will 
not receive messages from all streams. In the PageViewEvent/ServiceMetricEvent 
example, the last two StreamTask instances would only receive messages from the 
ServiceMetricEvent topic (partitions 12 and 13). The lower 12 instances would 
receive messages from both streams. If your Samza job is reading more than one 
input stream, you probably want all input streams to have the same number of 
partitions, especially if you&#39;re trying to join streams together. T
 he second important property is that Samza assumes that a stream&#39;s 
partition count will never change. No partition splitting is supported. If an 
input stream has N partitions, it is expected that it has had, and will always 
have N partitions. If you want to re-partition, you must read messages from the 
stream, and write them out to a new stream that has the number of partitions 
that you want. For example you could read messages from PageViewEvent, and 
write them to PageViewEventRepartition, which could have 14 partitions. If you 
did this, then you would achieve balance between PageViewEventRepartition and 
ServiceMetricEvent.</p>
+<p>When the TaskRunner&#39;s StreamConsumer threads are reading messages from 
each input stream partition, the messages that it receives are tagged with the 
partition number that it came from. Each message is fed to the StreamTask 
instance that corresponds to the message&#39;s partition. This design has two 
important properties. When a Samza job has more than one input stream, and 
those streams have an imbalanced number of partitions (e.g. one has 12 
partitions and the other has 14), then some of your StreamTask instances will 
not receive messages from all streams. In the PageViewEvent/ServiceMetricEvent 
example, the last two StreamTask instances would only receive messages from the 
ServiceMetricEvent topic (partitions 12 and 13). The lower 12 instances would 
receive messages from both streams. If your Samza job is reading more than one 
input stream, you probably want all input streams to have the same number of 
partitions, especially if you&#39;re trying to join streams together. T
 he second important property is that Samza assumes that a stream&#39;s 
partition count will never change. No partition splitting is supported. If an 
input stream has N partitions, it is expected that it has always had, and will 
always have N partitions. If you want to re-partition, you must read messages 
from the stream, and write them out to a new stream that has the number of 
partitions that you want. For example you could read messages from 
PageViewEvent, and write them to PageViewEventRepartition, which could have 14 
partitions. If you did this, then you would achieve balance between 
PageViewEventRepartition and ServiceMetricEvent.</p>
 
 <p>This design is important because it guarantees that any state that your 
StreamTask keeps in memory will be isolated on a per-partition basis. For 
example, if you refer back to the page-view counting job we used as an example 
in the <a href="../introduction/architecture.html">Architecture</a> section, we 
might have a Map&lt;Integer, Integer&gt; map that keeps track of page view 
counts per-member ID. If we were to have just one StreamTask per Samza job, for 
instance, then the member ID counts from different partitions would be 
inter-mingled into the same map. This inter-mingling would prevent us from 
moving partitions between processes or machines, which is something that we 
want to do with YARN. You can imagine a case where you started with one 
TaskRunner in a single YARN container. Your Samza job might be unable to keep 
up with only one container, so you ask for a second YARN container to put some 
of the StreamTask partitions. In such a case, how would we split the counts 
such th
 at one container gets only member ID counts for the partitions in charge of? 
This is effectively impossible if we&#39;ve inter-mingled the StreamTask&#39;s 
state together. This is why we isolate StreamTask instances on a per-partition 
basis: to make partition migration possible.</p>


Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/windowing.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/windowing.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/windowing.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/windowing.html Mon 
Sep  2 16:35:10 2013
@@ -65,15 +65,17 @@
         <div class="body">
           <h2>Windowing</h2>
 
-<p>Referring back to the, &quot;count PageViewEvent by member ID,&quot; 
example in the <a href="../introduction/architecture.html">Architecture</a> 
section, one thing that we left out was what we do with the counts. Let&#39;s 
say that the Samza job wants to update the member ID counts in a database once 
every minute. Here&#39;s how it would work. The Samza job that does the 
counting would keep a Map&lt;Integer, Integer&gt; in memory, which maps member 
IDs to page view counts. Every time a message arrives, the job would take the 
member ID in the PageViewEvent, and use it to increment the member ID&#39;s 
count in the in-memory map. Then, once a minute, the StreamTask would update 
the database (total<em>count += current</em>count) for every member ID in the 
map, and then reset the count map.</p>
+<p>Referring back to the &quot;count PageViewEvent by member ID&quot; example 
in the <a href="../introduction/architecture.html">Architecture</a> section, 
one thing that we left out was what we do with the counts. Let&#39;s say that 
the Samza job wants to update the member ID counts in a database once every 
minute. Here&#39;s how it would work. The Samza job that does the counting 
would keep a Map&lt;Integer, Integer&gt; in memory, which maps member IDs to 
page view counts. Every time a message arrives, the job would take the member 
ID in the PageViewEvent, and use it to increment the member ID&#39;s count in 
the in-memory map. Then, once a minute, the StreamTask would update the 
database (total<em>count += current</em>count) for every member ID in the map, 
and then reset the count map.</p>
 
 <p>Windowing is how we achieve this. If a StreamTask implements the 
WindowableTask interface, the TaskRunner will call the window() method on the 
task over a configured interval.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">public interface WindowableTask {
+<div class="highlight"><pre><code class="text language-text">public interface 
WindowableTask {
   void window(MessageCollector collector, TaskCoordinator coordinator);
 }
 </code></pre></div>
 <p>If you choose to implement the WindowableTask interface, you can use the 
Samza job&#39;s configuration to define how often the TaskRunner should call 
your window() method. In the PageViewEvent example (above), you would define it 
to flush every 60000 milliseconds (60 seconds).</p>
 
+<h2><a href="event-loop.html">Event Loop &raquo;</a></h2>
+
 
         </div>
         </div>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html 
(original)
+++ 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html 
Mon Sep  2 16:35:10 2013
@@ -89,17 +89,17 @@
 
 <p><img src="/img/0.7.0/learn/documentation/introduction/samza-hadoop.png" 
alt="diagram-medium"></p>
 
-<p>Before going in-depth on each of these three layers, it should be noted 
that Samza supports is not limited to these systems. Both Samza&#39;s execution 
and streaming layer are pluggable, and allow developers to implement 
alternatives if they prefer.</p>
+<p>Before going in-depth on each of these three layers, it should be noted 
that Samza&#39;s support is not limited to these systems. Both Samza&#39;s 
execution and streaming layer are pluggable, and allow developers to implement 
alternatives if they prefer.</p>
 
 <h3>Kafka</h3>
 
 <p><a href="http://kafka.apache.org/";>Kafka</a> is a distributed pub/sub and 
message queueing system that provides at-least once messaging guarantees, and 
highly available partitions (i.e. a stream&#39;s partitions will be available, 
even if a machine goes down).</p>
 
-<p>In Kafka, each stream is called a &quot;topic&quot;. Each topic is 
partitioned up, to make things scalable. When a &quot;producer&quot; sends a 
message to a topic, the producer provides a key, which is used to determine 
which partition the message should be sent to. Kafka &quot;brokers&quot;, each 
of which are in charge of some partitions, receive the messages that the 
producer sends, and stores them on their disk in a log file. Kafka 
&quot;consumers&quot; can then read from a topic by getting messages from all 
of a topic&#39;s partitions.</p>
+<p>In Kafka, each stream is called a &quot;topic&quot;. Each topic is 
partitioned, to make things scalable. When a &quot;producer&quot; sends a 
message to a topic, the producer provides a key, which is used to determine 
which partition the message should be sent to. Kafka &quot;brokers&quot;, each 
of which are in charge of some partitions, receive and store the messages that 
the producer sends. Kafka &quot;consumers&quot; can then read from a topic by 
getting messages from all of a topic&#39;s partitions.</p>
 
-<p>This has some interesting properties. First, all messages partitioned by 
the same key are guaranteed to be in the same Kafka topic partition. This 
means, if you wish to read all messages for a specific member ID, you only have 
to read the messages from the partition that the member ID is on, not the whole 
topic (assuming the topic is partitioned by member ID). Second, since a Kafka 
broker&#39;s file is a log, you can reference any point in the log file using 
an &quot;offset&quot;. This offset determines where a consumer is in a 
topic/partition pair. After every message a consumer reads from a 
topic/partition pair, the offset is incremented.</p>
+<p>This has some interesting properties. First, all messages partitioned by 
the same key are guaranteed to be in the same Kafka topic partition. This 
means, if you wish to read all messages for a specific member ID, you only have 
to read the messages from the partition that the member ID is on, not the whole 
topic (assuming the topic is partitioned by member ID). Second, since a Kafka 
broker&#39;s log is a file, you can reference any point in the log file using 
an &quot;offset&quot;. This offset determines where a consumer is in a 
topic/partition pair. After every message a consumer reads from a 
topic/partition pair, the offset is incremented.</p>
 
-<p>For more details on Kafka, see Kafka&#39;s <a 
href="http://kafka.apache.org/introduction.html";>introduction</a> and <a 
href="http://kafka.apache.org/design.html";>design</a> pages.</p>
+<p>For more details on Kafka, see Kafka&#39;s <a 
href="http://kafka.apache.org/documentation.html";>documentation</a> pages.</p>
 
 <h3>YARN</h3>
 
@@ -111,7 +111,7 @@
 <li><strong>Application</strong>: I want to run command X on two machines with 
512M memory</li>
 <li><strong>YARN</strong>: Cool, where&#39;s your code?</li>
 <li><strong>Application</strong>: http://path.to.host/jobs/download/my.tgz</li>
-<li><strong>YARN</strong>: I&#39;m running your job on node-1.grid and 
node-1.grid</li>
+<li><strong>YARN</strong>: I&#39;m running your job on node-1.grid and 
node-2.grid</li>
 </ol>
 
 <p>Samza uses YARN to manage:</p>
@@ -129,7 +129,7 @@
 
 <h4>YARN Architecture</h4>
 
-<p>YARN has three important pieces: a ResourceManager, a NodeManager, and an 
ApplicationMaster. In a YARN grid, every computer runs a NodeManager, which is 
responsible for running processes on the local machine. A ResourceManager talks 
to all of the NodeManagers to tell it what to run. Applications, in turn, talk 
to the ResourceManager when they wish to run something on the cluster. The 
flow, when starting a new application, goes from user application to YARN RM, 
to YARN NM. The third piece, the ApplicationMaster, is actually 
application-specific code that runs in the YARN cluster. It&#39;s responsible 
for managing the application&#39;s workload, asking for containers (usually, 
UNIX processes), and handling notifications when one of its containers 
fails.</p>
+<p>YARN has three important pieces: a ResourceManager, a NodeManager, and an 
ApplicationMaster. In a YARN grid, every computer runs a NodeManager, which is 
responsible for running processes on the local machine. A ResourceManager talks 
to all of the NodeManagers to tell them what to run. Applications, in turn, 
talk to the ResourceManager when they wish to run something on the cluster. The 
flow, when starting a new application, goes from user application to YARN RM, 
to YARN NM. The third piece, the ApplicationMaster, is actually 
application-specific code that runs in the YARN cluster. It&#39;s responsible 
for managing the application&#39;s workload, asking for containers (usually 
UNIX processes), and handling notifications when one of its containers 
fails.</p>
 
 <h4>Samza and YARN</h4>
 
@@ -137,7 +137,7 @@
 
 <p><img 
src="/img/0.7.0/learn/documentation/introduction/samza-yarn-integration.png" 
alt="diagram-small"></p>
 
-<p>The Samza client talks to the YARN RM when it wants to start a new Samza 
job. The YARN RM talks to a YARN NM to allocate space on the cluster for 
Samza&#39;s ApplicationMaster. Once the NM allocates space, it starts the Samza 
AM. After the Samza AM starts, it asks the YARN RM for one, or more, YARN 
containers to run Samza <a 
href="../container/task-runner.html">TaskRunners</a>. Again, the RM works with 
NMs to allocate space for the containers. Once the space has been allocated, 
the NMs start the Samza containers.</p>
+<p>The Samza client talks to the YARN RM when it wants to start a new Samza 
job. The YARN RM talks to a YARN NM to allocate space on the cluster for 
Samza&#39;s ApplicationMaster. Once the NM allocates space, it starts the Samza 
AM. After the Samza AM starts, it asks the YARN RM for one or more YARN 
containers to run Samza <a 
href="../container/task-runner.html">TaskRunners</a>. Again, the RM works with 
NMs to allocate space for the containers. Once the space has been allocated, 
the NMs start the Samza containers.</p>
 
 <h3>Samza</h3>
 
@@ -145,7 +145,7 @@
 
 <p><img 
src="/img/0.7.0/learn/documentation/introduction/samza-yarn-kafka-integration.png"
 alt="diagram-small"></p>
 
-<p>The Samza client uses YARN to run a Samza job. The Samza <a 
href="../container/task-runner.html">TaskRunners</a> run in one, or more, YARN 
containers, and execute user-written Samza <a 
href="../api/overview.html">StreamTasks</a>. The input and output for the Samza 
StreamTasks come from Kafka brokers that are (usually) co-located on the same 
machines as the YARN NMs.</p>
+<p>The Samza client uses YARN to run a Samza job. The Samza <a 
href="../container/task-runner.html">TaskRunners</a> run in one or more YARN 
containers, and execute user-written Samza <a 
href="../api/overview.html">StreamTasks</a>. The input and output for the Samza 
StreamTasks come from Kafka brokers that are (usually) co-located on the same 
machines as the YARN NMs.</p>
 
 <h3>Example</h3>
 
@@ -155,7 +155,7 @@
 
 <p>The input topic is partitioned using Kafka. Each Samza process reads 
messages from one or more of the input topic&#39;s partitions, and emits them 
back out to a different Kafka topic. Each output message is keyed by the 
message&#39;s member ID attribute, and this key is mapped to one of the 
topic&#39;s partitions (usually by hashing the key, and modding by the number 
of partitions in the topic). The Kafka brokers receive these messages, and 
buffer them on disk until the second job (the counting job on the bottom of the 
diagram) reads the messages, and increments its counters.</p>
 
-<p>There are some neat things to consider about this example. First, we&#39;re 
leveraging the fact that Kafka topics are inherently partitioned. This lets us 
run one or more Samza processes, and assign them each some partitions to read 
from. Second, since we&#39;re guaranteed that, for a given key, all messages 
will be on the same partition, we can actually split up the aggregation 
(counting). For example, if the first job&#39;s output had four partitions, we 
could assign two partitions to the first count process, and the other two 
partitions to the second count process. We&#39;d be guaranteed that for any 
give member ID, all of their messages will be consumed by either the first 
process or the second, but not both. This means we&#39;ll get accurate counts, 
even when partitioning. Third, the fact that we&#39;re using Kafka, which 
buffers messages on its brokers, also means that we don&#39;t have to worry as 
much about failures. If a process or machine fails, we can use YARN to start
  the process on another machine. When the process starts up again, it can get 
its last offset, and resume reading messages where it left off.</p>
+<p>There are some neat things to consider about this example. First, we&#39;re 
leveraging the fact that Kafka topics are inherently partitioned. This lets us 
run one or more Samza processes, and assign them each some partitions to read 
from. Second, since we&#39;re guaranteed that for a given key all messages will 
be on the same partition, we can actually split up the aggregation (counting). 
For example, if the first job&#39;s output had four partitions, we could assign 
two partitions to the first count process, and the other two partitions to the 
second count process. We&#39;d be guaranteed that for any give member ID, all 
of their messages will be consumed by either the first process or the second, 
but not both. This means we&#39;ll get accurate counts, even when partitioning. 
Third, the fact that we&#39;re using Kafka, which buffers messages on its 
brokers, also means that we don&#39;t have to worry as much about failures. If 
a process or machine fails, we can use YARN to start t
 he process on another machine. When the process starts up again, it can get 
its last offset and resume reading messages where it left off.</p>
 
 <h2><a href="../comparisons/introduction.html">Comparison Introduction 
&raquo;</a></h2>
 

Modified: 
incubator/samza/site/learn/documentation/0.7.0/introduction/background.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/background.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/introduction/background.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/introduction/background.html 
Mon Sep  2 16:35:10 2013
@@ -96,7 +96,7 @@
 <p>Samza is a stream processing framework with the following features:</p>
 
 <ul>
-<li><strong>Simpe API:</strong> Samza provides a very simple call-back based 
&quot;process message&quot; API.</li>
+<li><strong>Simple API:</strong> Samza provides a very simple call-back based 
&quot;process message&quot; API.</li>
 <li><strong>Managed state:</strong> Samza manages snapshotting and restoration 
of a stream processor&#39;s state. Samza will restore a stream processor&#39;s 
state to a snapshot consistent with the processor&#39;s last read messages when 
the processor is restarted. Samza is built to handle large amounts of state 
(even many gigabytes per partition).</li>
 <li><strong>Fault tolerance:</strong> Samza will work with YARN to 
transparently migrate your tasks whenever a machine in the cluster fails.</li>
 <li><strong>Durability:</strong> Samza uses Kafka to guarantee that no 
messages will ever be lost.</li>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/introduction/concepts.html 
Mon Sep  2 16:35:10 2013
@@ -97,9 +97,9 @@
 
 <p>The task processes messages from each of its input partitions <em>in order 
by offset</em>. There is no defined ordering between partitions.</p>
 
-<p>The position of the task in its input partitions can be represented by set 
of offsets, one for each partition.</p>
+<p>The position of the task in its input partitions can be represented by a 
set of offsets, one for each partition.</p>
 
-<p>The number of tasks a job has is fixed and does not change (though the 
computational resources assigned to the job may go up and down). The number of 
tasks a job has also determines the maximum parallelism of the job as each task 
processes messages sequentially. There cannot be more tasks than input 
partitions (or there would be some task with no input).</p>
+<p>The number of tasks a job has is fixed and does not change (though the 
computational resources assigned to the job may go up and down). The number of 
tasks a job has also determines the maximum parallelism of the job as each task 
processes messages sequentially. There cannot be more tasks than input 
partitions (or there would be some tasks with no input).</p>
 
 <p>The partitions assigned to a task will never change: if a task is on a 
machine that fails the task will be restarted elsewhere still consuming the 
same stream partitions.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html Mon 
Sep  2 16:35:10 2013
@@ -66,7 +66,7 @@
           <h2>Configuration</h2>
 
 <p>All Samza jobs have a configuration file that defines the job. A very basic 
configuration file looks like this:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text"># Job
+<div class="highlight"><pre><code class="text language-text"># Job
 job.factory.class=samza.job.local.LocalJobFactory
 job.name=hello-world
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html Mon Sep 
 2 16:35:10 2013
@@ -66,19 +66,19 @@
           <h2>JobRunner</h2>
 
 <p>Samza jobs are started using a script called run-job.sh.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">samza-example/target/bin/run-job.sh \
+<div class="highlight"><pre><code class="text 
language-text">samza-example/target/bin/run-job.sh \
   --config-factory=samza.config.factories.PropertiesConfigFactory \
   --config-path=file://$PWD/config/hello-world.properties
 </code></pre></div>
 <p>You provide two parameters to the run-job.sh script. One is the config 
location, and the other is a factory class that is used to read your 
configuration file. The run-job.sh script is actually executing a Samza class 
called JobRunner. The JobRunner uses your ConfigFactory to get a Config object 
from the config path.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">public interface ConfigFactory {
+<div class="highlight"><pre><code class="text language-text">public interface 
ConfigFactory {
   Config getConfig(URI configUri);
 }
 </code></pre></div>
 <p>The Config object is just a wrapper around Map<String, String>, with some 
nice helper methods. Out of the box, Samza ships with the 
PropertiesConfigFactory, but developers can implement any kind of ConfigFactory 
they wish.</p>
 
 <p>Once the JobRunner gets your configuration, it gives your configuration to 
the StreamJobFactory class defined by the &quot;job.factory&quot; property. 
Samza ships with two job factory implementations: LocalJobFactory and 
YarnJobFactory. The StreamJobFactory&#39;s responsibility is to give the 
JobRunner a job that it can run.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">public interface StreamJob {
+<div class="highlight"><pre><code class="text language-text">public interface 
StreamJob {
   StreamJob submit();
 
   StreamJob kill();

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html Mon Sep  2 
16:35:10 2013
@@ -70,7 +70,7 @@
 <h3>Log4j</h3>
 
 <p>The <a href="/startup/hello-samza/0.7.0">hello-samza</a> project shows how 
to use <a href="http://logging.apache.org/log4j/1.2/";>log4j</a> with Samza. To 
turn on log4j logging, you just need to make sure slf4j-log4j12 is in your 
Samza TaskRunner&#39;s classpath. In Maven, this can be done by adding the 
following dependency to your Samza package project.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">&lt;dependency&gt;
+<div class="highlight"><pre><code class="text language-text">&lt;dependency&gt;
   &lt;groupId&gt;org.slf4j&lt;/groupId&gt;
   &lt;artifactId&gt;slf4j-log4j12&lt;/artifactId&gt;
   &lt;scope&gt;runtime&lt;/scope&gt;
@@ -82,7 +82,7 @@
 <h4>log4j.xml</h4>
 
 <p>Samza&#39;s <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j.xml exists in your <a 
href="packaging.html">Samza package&#39;s</a> lib directory.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">-Dlog4j.configuration=file:$base_dir/lib/log4j.xml
+<div class="highlight"><pre><code class="text 
language-text">-Dlog4j.configuration=file:$base_dir/lib/log4j.xml
 </code></pre></div>
 <!-- TODO add notes showing how to use task.opts for gc logging
 #### task.opts
@@ -95,7 +95,7 @@
 <h3>Garbage Collection Logging</h3>
 
 <p>Samza&#39;s will automatically set the following garbage collection logging 
setting, and will output it to 
<em>$SAMZA</em>_<em>LOG</em>_<em>DIR</em>/gc.log.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">-XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log
+<div class="highlight"><pre><code class="text 
language-text">-XX:+PrintGCDateStamps -Xloggc:$SAMZA_LOG_DIR/gc.log
 </code></pre></div>
 <h4>Rotation</h4>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/packaging.html Mon Sep  
2 16:35:10 2013
@@ -66,13 +66,13 @@
           <h2>Packaging</h2>
 
 <p>The <a href="job-runner.html">JobRunner</a> page talks about run-job.sh, 
and how it&#39;s used to start a job either locally (LocalJobFactory) or with 
YARN (YarnJobFactory). In the diagram that shows the execution flow, it also 
shows a run-task.sh script. This script, along with a run-am.sh script, are 
what Samza actually calls to execute its code.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">bin/run-am.sh
+<div class="highlight"><pre><code class="text language-text">bin/run-am.sh
 bin/run-task.sh
 </code></pre></div>
 <p>The run-task.sh script is responsible for starting the TaskRunner. The 
run-am.sh script is responsible for starting Samza&#39;s application master for 
YARN. Thus, the run-am.sh script is only used by the YarnJob, but both YarnJob 
and ProcessJob use run-task.sh.</p>
 
 <p>Typically, these two scripts are bundled into a tar.gz file that has a 
structure like this:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">bin/run-am.sh
+<div class="highlight"><pre><code class="text language-text">bin/run-am.sh
 bin/run-class.sh
 bin/run-job.sh
 bin/run-task.sh

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html Mon Sep  
2 16:35:10 2013
@@ -68,7 +68,7 @@
 <p>When you define job.factory.class=samza.job.yarn.YarnJobFactory in your 
job&#39;s configuration, Samza will use YARN to execute your job. The 
YarnJobFactory will use the YARN_HOME environment variable on the machine that 
run-job.sh is executed on to get the appropriate YARN configuration, which will 
define where the YARN resource manager is. The YarnJob will work with the 
resource manager to get your job started on the YARN cluster.</p>
 
 <p>If you want to use YARN to run your Samza job, you&#39;ll also need to 
define the location of your Samza job&#39;s package. For example, you might 
say:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">yarn.package.path=http://my.http.server/jobs/ingraphs-package-0.0.55.tgz
+<div class="highlight"><pre><code class="text 
language-text">yarn.package.path=http://my.http.server/jobs/ingraphs-package-0.0.55.tgz
 </code></pre></div>
 <p>This .tgz file follows the conventions outlined on the <a 
href="packaging.html">Packaging</a> page (it has bin/run-am.sh and 
bin/run-task.sh). YARN NodeManagers will take responsibility for downloading 
this .tgz file on the appropriate machines, and untar&#39;ing them. From there, 
YARN will execute run-am.sh or run-task.sh for the Samza Application Master, 
and TaskRunner, respectively.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html Mon 
Sep  2 16:35:10 2013
@@ -74,7 +74,7 @@
 <h3>Auto-Create Topics</h3>
 
 <p>Kafka brokers should be configured to automatically create topics. Without 
this, it&#39;s going to be very cumbersome to run Samze jobs, since jobs will 
write to arbitrary (and sometimes new) topics.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">auto.create.topics.enable=true
+<div class="highlight"><pre><code class="text 
language-text">auto.create.topics.enable=true
 </code></pre></div>
 
         </div>

Modified: incubator/samza/site/sitemap.xml
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/sitemap.xml?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/sitemap.xml (original)
+++ incubator/samza/site/sitemap.xml Mon Sep  2 16:35:10 2013
@@ -4,7 +4,7 @@
 
   <url>
     <loc>http://samza.incubator.apache.org/</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     <changefreq>daily</changefreq>
     <priority>1.0</priority>
   </url>
@@ -14,273 +14,273 @@
   
   <url>
     <loc>http://samza.incubator.apache.org/community/committers.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/community/irc.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/community/mailing-lists.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/code.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/coding-guide.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/disclaimer.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/projects.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/rules.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/contribute/seps.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/api/overview.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/introduction.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/mupd8.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/checkpointing.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/event-loop.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/jmx.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/metrics.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/state-management.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/streams.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/task-runner.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/windowing.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/architecture.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/background.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/concepts.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configuration.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/job-runner.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/logging.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/packaging.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/yarn-jobs.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/kafka.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/security.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/application-master.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/isolation.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     <loc>http://samza.incubator.apache.org/startup/download/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>
   
   <url>
     
<loc>http://samza.incubator.apache.org/startup/hello-samza/0.7.0/index.html</loc>
-    <lastmod>2013-08-23</lastmod>
+    <lastmod>2013-09-02</lastmod>
     
     
   </url>

Modified: incubator/samza/site/startup/download/index.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/startup/download/index.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/startup/download/index.html (original)
+++ incubator/samza/site/startup/download/index.html Mon Sep  2 16:35:10 2013
@@ -129,7 +129,7 @@ Snapshot builds are available in the Apa
 <h3>Checking out and Building</h3>
 
 <p>If you&#39;re interested in working on Samza, or building the JARs from 
scratch, then you&#39;ll need to checkout and build the code. Samza does not 
have a binary release at this time. To check out and build Samza, run these 
commands.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">git clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git
 cd incubator-samza
 ./gradlew clean build
 </code></pre></div>

Modified: incubator/samza/site/startup/hello-samza/0.7.0/index.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/startup/hello-samza/0.7.0/index.html?rev=1519471&r1=1519470&r2=1519471&view=diff
==============================================================================
--- incubator/samza/site/startup/hello-samza/0.7.0/index.html (original)
+++ incubator/samza/site/startup/hello-samza/0.7.0/index.html Mon Sep  2 
16:35:10 2013
@@ -72,19 +72,19 @@
 <h3>Get the Code</h3>
 
 <p>You&#39;ll need to check out and publish Samza, since it&#39;s not 
available in a Maven repository right now.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">git clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git
 cd incubator-samza
 ./gradlew -PscalaVersion=2.8.1 clean publishToMavenLocal
 </code></pre></div>
 <p>Next, check out the hello-samza project.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">git clone git://github.com/linkedin/hello-samza.git
+<div class="highlight"><pre><code class="text language-text">git clone 
git://github.com/linkedin/hello-samza.git
 </code></pre></div>
 <p>This project contains everything you&#39;ll need to run your first Samza 
jobs.</p>
 
 <h3>Start a Grid</h3>
 
 <p>A Samza grid usually comprises three different systems: <a 
href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html";>YARN</a>,
 <a href="http://kafka.apache.org/";>Kafka</a>, and <a 
href="http://zookeeper.apache.org/";>ZooKeeper</a>. The hello-samza project 
comes with a script called &quot;grid&quot; to help you setup these systems. 
Start by running:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">bin/grid
+<div class="highlight"><pre><code class="text language-text">bin/grid
 </code></pre></div>
 <p>This command will download, install, and start ZooKeeper, Kafka, and YARN. 
All package files will be put in a sub-directory called &quot;deploy&quot; 
inside hello-samza&#39;s root folder.</p>
 
@@ -93,34 +93,34 @@ cd incubator-samza
 <h3>Build a Samza Job Package</h3>
 
 <p>Before you can run a Samza job, you need to build a package for it. This 
package is what YARN uses to deploy your jobs on the grid.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">mvn clean package
+<div class="highlight"><pre><code class="text language-text">mvn clean package
 mkdir -p deploy/samza
 tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C 
deploy/samza
 </code></pre></div>
 <h3>Run a Samza Job</h3>
 
 <p>After you&#39;ve built your Samza package, you can start a job on the grid 
using the run-job.sh script.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
+<div class="highlight"><pre><code class="text 
language-text">deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
 </code></pre></div>
 <p>The job will consume a feed of real-time edits from Wikipedia, and produce 
them to a Kafka topic called &quot;wikipedia-raw&quot;. Give the job a minute 
to startup, and then tail the Kafka topic:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-raw
+<div class="highlight"><pre><code class="text 
language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-raw
 </code></pre></div>
 <p>Pretty neat, right? Now, check out the YARN UI again (<a 
href="http://localhost:8088";>http://localhost:8088</a>). This time around, 
you&#39;ll see your Samza job is running!</p>
 
 <h3>Generate Wikipedia Statistics</h3>
 
 <p>Let&#39;s calculate some statistics based on the messages in the 
wikipedia-raw topic. Start two more jobs:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
+<div class="highlight"><pre><code class="text 
language-text">deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-parser.properties
 deploy/samza/bin/run-job.sh 
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-stats.properties
 </code></pre></div>
 <p>The first job (wikipedia-parser) parses the messages in wikipedia-raw, and 
extracts information about the size of the edit, who made the change, etc. You 
can take a look at its output with:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-edits
+<div class="highlight"><pre><code class="text 
language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-edits
 </code></pre></div>
 <p>The last job (wikipedia-stats) reads messages from the wikipedia-edits 
topic, and calculates counts, every ten seconds, for all edits that were made 
during that window. It outputs these counts to the wikipedia-stats topic.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-stats
+<div class="highlight"><pre><code class="text 
language-text">deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper 
localhost:2181 --topic wikipedia-stats
 </code></pre></div>
 <p>The messages in the stats topic look like this:</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">{&quot;is-talk&quot;:2,&quot;bytes-added&quot;:5276,&quot;edits&quot;:13,&quot;unique-titles&quot;:13}
+<div class="highlight"><pre><code class="text 
language-text">{&quot;is-talk&quot;:2,&quot;bytes-added&quot;:5276,&quot;edits&quot;:13,&quot;unique-titles&quot;:13}
 
{&quot;is-bot-edit&quot;:1,&quot;is-talk&quot;:3,&quot;bytes-added&quot;:4211,&quot;edits&quot;:30,&quot;unique-titles&quot;:30,&quot;is-unpatrolled&quot;:1,&quot;is-new&quot;:2,&quot;is-minor&quot;:7}
 
{&quot;bytes-added&quot;:3180,&quot;edits&quot;:19,&quot;unique-titles&quot;:19,&quot;is-unpatrolled&quot;:1,&quot;is-new&quot;:1,&quot;is-minor&quot;:3}
 
{&quot;bytes-added&quot;:2218,&quot;edits&quot;:18,&quot;unique-titles&quot;:18,&quot;is-unpatrolled&quot;:2,&quot;is-new&quot;:2,&quot;is-minor&quot;:3}
@@ -130,7 +130,7 @@ deploy/samza/bin/run-job.sh --config-fac
 <h3>Shutdown</h3>
 
 <p>After you&#39;re done, you can clean everything up using the same grid 
script.</p>
-<div class="highlight"><pre><code class="text language-text" 
data-lang="text">bin/grid stop yarn
+<div class="highlight"><pre><code class="text language-text">bin/grid stop yarn
 bin/grid stop kafka
 bin/grid stop zookeeper
 </code></pre></div>

svn commit: r1519471 [3/3] - in /incubator/samza/site: ./ contribute/ learn/documentation/0.7.0/api/ learn/documentation/0.7.0/api/javadocs/ learn/documentation/0.7.0/api/javadocs/org/apache/samza/ learn/documentation/0.7.0/api/javadocs/org/apache/samz...

Reply via email to