Added: 
incubator/samza/site/learn/documentation/0.7.0/comparisons/spark-streaming.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/comparisons/spark-streaming.html?rev=1612998&view=auto
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/comparisons/spark-streaming.html 
(added)
+++ 
incubator/samza/site/learn/documentation/0.7.0/comparisons/spark-streaming.html 
Thu Jul 24 05:05:00 2014
@@ -0,0 +1,240 @@
+<!DOCTYPE html>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Samza - Spark Streaming</title>
+    <link href='/css/ropa-sans.css' rel='stylesheet' type='text/css'/>
+    <link href="/css/bootstrap.min.css" rel="stylesheet"/>
+    <link href="/css/font-awesome.min.css" rel="stylesheet"/>
+    <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
+    <link rel="icon" type="image/png" href="/img/samza-icon.png">
+  </head>
+  <body>
+    <div class="wrapper">
+      <div class="wrapper-content">
+
+        <div class="masthead">
+          <div class="container">
+            <div class="masthead-logo">
+              <a href="/" class="logo">samza</a>
+            </div>
+            <div class="masthead-icons">
+              <div class="pull-right">
+                <a href="/startup/download"><i class="fa 
fa-arrow-circle-o-down masthead-icon"></i></a>
+                <a 
href="https://git-wip-us.apache.org/repos/asf?p=incubator-samza.git;a=tree"; 
target="_blank"><i class="fa fa-code masthead-icon" style="font-weight: 
bold;"></i></a>
+                <a href="https://twitter.com/samzastream"; target="_blank"><i 
class="fa fa-twitter masthead-icon"></i></a>
+              </div>
+            </div>
+          </div><!-- /.container -->
+        </div>
+
+        <div class="container">
+          <div class="menu">
+            <h1><i class="fa fa-rocket"></i> Getting Started</h1>
+            <ul>
+              <li><a href="/startup/hello-samza/0.7.0">Hello Samza</a></li>
+              <li><a href="/startup/download">Download</a></li>
+            </ul>
+
+            <h1><i class="fa fa-book"></i> Learn</h1>
+            <ul>
+              <li><a href="/learn/documentation/0.7.0">Documentation</a></li>
+              <li><a href="/learn/tutorials/0.7.0">Tutorials</a></li>
+              <li><a href="http://wiki.apache.org/samza/FAQ";>FAQ</a></li>
+              <li><a href="http://wiki.apache.org/samza";>Wiki</a></li>
+              <li><a href="http://wiki.apache.org/samza/PapersAndTalks";>Papers 
&amp; Talks</a></li>
+              <li><a href="http://blogs.apache.org/samza";>Blog</a></li>
+            </ul>
+
+            <h1><i class="fa fa-comments"></i> Community</h1>
+            <ul>
+              <li><a href="/community/mailing-lists.html">Mailing 
Lists</a></li>
+              <li><a href="/community/irc.html">IRC</a></li>
+              <li><a 
href="https://issues.apache.org/jira/browse/SAMZA";>Bugs</a></li>
+              <li><a href="http://wiki.apache.org/samza/PoweredBy";>Powered 
by</a></li>
+              <li><a 
href="http://wiki.apache.org/samza/Ecosystem";>Ecosystem</a></li>
+              <li><a href="/community/committers.html">Committers</a></li>
+            </ul>
+
+            <h1><i class="fa fa-code"></i> Contribute</h1>
+            <ul>
+              <li><a href="/contribute/rules.html">Rules</a></li>
+              <li><a href="/contribute/coding-guide.html">Coding Guide</a></li>
+              <li><a href="/contribute/projects.html">Projects</a></li>
+              <li><a href="/contribute/seps.html">SEPs</a></li>
+              <li><a href="/contribute/code.html">Code</a></li>
+              <li><a href="https://reviews.apache.org/groups/samza";>Review 
Board</a></li>
+              <li><a href="https://builds.apache.org/";>Unit Tests</a></li>
+              <li><a href="/contribute/disclaimer.html">Disclaimer</a></li>
+            </ul>
+          </div>
+
+          <div class="content">
+            <!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<h2>Spark Streaming</h2>
+
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+<p><em>People generally want to know how similar systems compare. We&rsquo;ve 
done our best to fairly contrast the feature sets of Samza with other systems. 
But we aren&rsquo;t experts in these frameworks, and we are, of course, totally 
biased. If we have goofed anything, please let us know and we will correct 
it.</em></p>
+
+<p><a 
href="http://spark.apache.org/docs/latest/streaming-programming-guide.html";>Spark
 Streaming</a> is a stream processing system that uses the core <a 
href="http://spark.apache.org/";>Apache Spark</a> API. Both Samza and Spark 
Streaming provide data consistency, fault tolerance, a programming API, etc. 
Spark&rsquo;s approach to streaming is different from Samza&rsquo;s. Samza 
processes messages as they are received, while Spark Streaming treats streaming 
as a series of deterministic batch operations. Spark Streaming groups the 
stream into batches of a fixed duration (such as 1 second). Each batch is 
represented as a Resilient Distributed Dataset (<a 
href="http://www.cs.berkeley.edu/%7Ematei/papers/2012/nsdi_spark.pdf";>RDD</a>). 
A neverending sequence of these RDDs is called a Discretized Stream (<a 
href="http://www.cs.berkeley.edu/%7Ematei/papers/2012/hotcloud_spark_streaming.pdf";>DStream</a>).</p>
+
+<h3 id="overview-of-spark-streaming">Overview of Spark Streaming</h3>
+
+<p>Before going into the comparison, here is a brief overview of the Spark 
Streaming application. If you already are familiar with Spark Streaming, you 
may skip this part. There are two main parts of a Spark Streaming application: 
data receiving and data processing. </p>
+
+<ul>
+<li>Data receiving is accomplished by a <a 
href="https://spark.apache.org/docs/latest/streaming-custom-receivers.html";>receiver</a>
 which receives data and stores data in Spark (though not in an RDD at this 
point). </li>
+<li>Data processing transfers the data stored in Spark into the DStream. You 
can then apply the two <a 
href="https://spark.apache.org/docs/latest/streaming-programming-guide.html#operations";>operations</a>
 &ndash; transformations and output operations &ndash; on the DStream. The 
operations for DStream are a little different from what you can use for the 
general Spark RDD because of the streaming environment.</li>
+</ul>
+
+<p>Here is an overview of the Spark Streaming&rsquo;s <a 
href="https://spark.apache.org/docs/latest/cluster-overview.html";>deploy</a>. 
Spark has a SparkContext (in SparkStreaming, it’s called <a 
href="https://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.streaming.StreamingContext";>StreamingContext</a>)
 object in the driver program. The SparkContext talks with cluster manager 
(e.g. YARN, Mesos) which then allocates resources (that is, executors) for the 
Spark application. And executors will run tasks sent by the SparkContext (<a 
href="http://spark.apache.org/docs/latest/cluster-overview.html#compenents";>read
 more</a>). In YARN’s context, one executor is equivalent to one container. 
Tasks are what is running in the containers. The driver program runs in the 
client machine that submits job (<a 
href="https://spark.apache.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn";>client
 mode</a>) or in the application manager (<a href="https://spark.apac
 he.org/docs/latest/running-on-yarn.html#launching-spark-on-yarn">cluster 
mode</a>). Both data receiving and data processing are tasks for executors. One 
receiver (receives one input stream) is a long-running task. Processing has a 
bunch of tasks. All the tasks are sent to the available executors.</p>
+
+<h3 id="ordering-and-guarantees">Ordering and Guarantees</h3>
+
+<p>Spark Streaming guarantees ordered processing of batches in a DStream. 
Since messages are processed in batches by side-effect-free operators, the 
exact ordering of messages is not important in Spark Streaming. Spark Streaming 
does not gurantee at-least-once or at-most-once messaging semantics because in 
some situations it may lose data when the driver program fails (see <a 
href="#fault-tolerance">fault-tolerance</a>). In addition, because Spark 
Streaming requires transformation operations to be deterministic, it is 
unsuitable for nondeterministic processing, e.g. a randomized machine learning 
algorithm.</p>
+
+<p>Samza guarantees processing the messages as the order they appear in the 
partition of the stream. Samza also allows you to define a deterministic 
ordering of messages between partitions using a <a 
href="../container/streams.html">MessageChooser</a>. It provides an 
at-least-once message delivery guarantee. And it does not require operations to 
be deterministic.</p>
+
+<h3 id="state-management">State Management</h3>
+
+<p>Spark Streaming provides a state DStream which keeps the state for each key 
and a transformation operation called <a 
href="https://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations";>updateStateByKey</a>
 to mutate state. Everytime updateStateByKey is applied, you will get a new 
state DStream where all of the state is updated by applying the function passed 
to updateStateByKey. This transformation can serve as a basic key-value store, 
though it has a few drawbacks:</p>
+
+<ul>
+<li>you can only apply the DStream operations to your state because 
essentially it&rsquo;s a DStream.</li>
+<li>does not provide any key-value access to the data. If you want to access a 
certain key-value, you need to iterate the whole DStream.</li>
+<li>it is inefficient when the state is large because every time a new batch 
is processed, Spark Streaming consumes the entire state DStream to update 
relevant keys and values.</li>
+</ul>
+
+<p>Spark Streaming periodically writes intermedia data of stateful operations 
(updateStateByKey and window-based operations) into the HDFS. In the case of 
updateStateByKey, the entire state RDD is written into the HDFS after every 
checkpointing interval. As we mentioned in the <em><a 
href="../container/state-management.html#in-memory-state-with-checkpointing">in 
memory state with checkpointing</a></em>, writing the entire state to durable 
storage is very expensive when the state becomes large.</p>
+
+<p>Samza uses an embedded key-value store for <a 
href="../container/state-management.html#local-state-in-samza">state 
management</a>. This store is replicated as it&rsquo;s mutated, and supports 
both very high throughput writing and reading. And it gives you a lot of 
flexibility to decide what kind of state you want to maintain. What is more, 
you can also plug in other <a 
href="../container/state-management.html#other-storage-engines">storage 
engines</a>, which enables great flexibility in the stream processing 
algorithms you can use. A good comparison of different types of state manager 
approaches can be found <a 
href="../container/state-management.html#approaches-to-managing-task-state">here</a>.</p>
+
+<p>One of the common use cases in state management is <a 
href="../container/state-management.html#stream-stream-join">stream-stream 
join</a>. Though Spark Streaming has the <a 
href="https://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations";>join</a>
 operation, this operation only joins two batches that are in the same time 
interval. It does not deal with the situation where events in two streams have 
mismatch. Spark Streaming&rsquo;s updateStateByKey approach to store mismatch 
events also has the limitation because if the number of mismatch events is 
large, there will be a large state, which causes the inefficience in Spark 
Streaming. While Samza does not have this limitation.</p>
+
+<h3 id="partitioning-and-parallelism">Partitioning and Parallelism</h3>
+
+<p>Spark Streaming&rsquo;s Parallelism is achieved by splitting the job into 
small tasks and sending them to executors. There are two types of <a 
href="http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving";>parallelism
 in Spark Streaming</a>: parallelism in receiving the stream and parallelism in 
processing the stream. On the receiving side, one input DStream creates one 
receiver, and one receiver receives one input stream of data and runs as a 
long-running task. So in order to parallelize the receiving process, you can 
split one input stream into multiple input streams based on some criteria (e.g. 
if you are receiving a Kafka stream with some partitions, you may split this 
stream based on the partition). Then you can create multiple input DStreams (so 
multiple receivers) for these streams and the receivers will run as multiple 
tasks. Accordingly, you should provide enough resources by increasing the core 
number of the executors 
 or bringing up more executors. Then you can combine all the input Dstreams 
into one DStream during the processing if necessary. On the processing side, 
since a DStream is a continuous sequence of RDDs, the parallelism is simply 
accomplished by normal RDD operations, such as map, reduceByKey, reduceByWindow 
(check <a 
href="https://spark.apache.org/docs/latest/tuning.html#level-of-parallelism";>here</a>).</p>
+
+<p>Samza’s parallelism is achieved by splitting processing into independent 
<a href="../api/overview.html">tasks</a> which can be parallelized. You can run 
multiple tasks in one container or only one task per container. That depends on 
your workload and latency requirement. For example, if you want to quickly <a 
href="../jobs/reprocessing.html">reprocess a stream</a>, you may increase the 
number of containers to one task per container. It is important to notice that 
one container only uses <a href="../container/event-loop.html">one thread</a>, 
which maps to exactly one CPU. This design attempts to simplify  resource 
management and the isolation between jobs.</p>
+
+<h3 id="buffering-&amp;-latency">Buffering &amp; Latency</h3>
+
+<p>Spark streaming essentially is a sequence of small batch processes. With a 
fast execution engine, it can reach the latency as low as one second (from 
their <a 
href="http://www.cs.berkeley.edu/%7Ematei/papers/2012/hotcloud_spark_streaming.pdf";>paper</a>).
 If the processing is slower than receiving, the data will be queued as 
DStreams in memory and the queue will keep increasing. In order to run a 
healthy Spark streaming application, the system should be <a 
href="http://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning";>tuned</a>
 until the speed of processing is as fast as receiving.</p>
+
+<p>Samza jobs can have latency in the low milliseconds when running with 
Apache Kafka. It has a different approach to buffering. The buffering mechanism 
is dependent on the input and output system. For example, when using <a 
href="http://kafka.apache.org/";>Kafka</a> as the input and output system, data 
is actually buffered to disk. This design decision, by sacrificing a little 
latency, allows the buffer to absorb a large backlog of messages when a job has 
fallen behind in its processing.</p>
+
+<h3 id="fault-tolerance">Fault-tolerance</h3>
+
+<p>There are two kinds of failures in both Spark Streaming and Samza: worker 
node (running executors) failure in Spark Streaming (equivalent to container 
failure in Samza) and driver node (running driver program) failure (equivalent 
to application manager (AM) failure in Samza).</p>
+
+<p>When a worker node fails in Spark Streaming, it will be restarted by the 
cluster manager. When a container fails in Samza, the application manager will 
work with YARN to start a new container. </p>
+
+<p>When a driver node fails in Spark Streaming, Spark’s <a 
href="http://spark.apache.org/docs/latest/spark-standalone.html";>standalone 
cluster mode</a> will restart the driver node automatically. But it is 
currently not supported in YARN and Mesos. You will need other mechanisms to 
restart the driver node automatically. Spark Streaming can use the checkpoint 
in HDFS to recreate the StreamingContext. When the AM fails in Samza, YARN will 
handle restarting the AM. Samza will restart all the containers if the AM 
restarts.</p>
+
+<p>In terms of data lost, there is a difference between Spark Streaming and 
Samza. If the input stream is active streaming system, such as Flume, Kafka, 
Spark Streaming may lose data if the failure happens when the data is received 
but not yet replicated to other nodes (also see <a 
href="https://issues.apache.org/jira/browse/SPARK-1647";>SPARK-1647</a>). Samza 
will not lose data when the failure happens because it has the concept of <a 
href="../container/checkpointing.html">checkpointing</a> that stores the offset 
of the latest processed message and always commits the checkpoint after 
processing the data. There is not data lost situation like Spark Streaming has. 
If a container fails, it reads from the latest checkpoint. When a Samza job 
recovers from a failure, it&rsquo;s possible that it will process some data 
more than once. This happens because the job restarts at the last checkpoint, 
and any messages that had been processed between that checkpoint and the 
failure are processed a
 gain. The amount of reprocessed data can be minimized by setting a small 
checkpoint interval period.</p>
+
+<h3 id="deployment-&amp;-execution">Deployment &amp; Execution</h3>
+
+<p>Spark has a SparkContext object to talk with cluster managers, which then 
allocate resources for the application. Currently Spark supports three types of 
cluster managers: <a 
href="http://spark.apache.org/docs/latest/spark-standalone.html";>Spark 
standalone</a>, <a href="http://mesos.apache.org/";>Apache Mesos</a> and <a 
href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html";>Hadoop
 YARN</a>. Besides these, Spark has a script for launching in <a 
href="http://spark.apache.org/docs/latest/ec2-scripts.html";>Amazon EC2</a>.</p>
+
+<p>Samza only supports YARN and local execution currently.</p>
+
+<h3 id="isolation">Isolation</h3>
+
+<p>Spark Streaming and Samza have the same isolation. Spark Streaming depends 
on cluster managers (e.g Mesos or YARN) and Samza depend on YARN to provide 
processor isolation. Different applications run in different JVMs. Data cannot 
be shared among different applications unless it is written to external 
storage. Since Samza provides out-of-box Kafka integration, it is very easy to 
reuse the output of other Samza jobs (see <a 
href="../introduction/concepts.html#dataflow-graphs">here</a>).</p>
+
+<h3 id="language-support">Language Support</h3>
+
+<p>Spark Streaming is written in Java and Scala and provides Scala, Java, and 
Python APIs. Samza is written in Java and Scala and has a Java API.</p>
+
+<h3 id="workflow">Workflow</h3>
+
+<p>In Spark Streaming, you build an entire processing graph with a DSL API and 
deploy that entire graph as one unit. The communication between the nodes in 
that graph (in the form of DStreams) is provided by the framework. That is a 
similar to Storm. Samza is totally different &ndash; each job is just a 
message-at-a-time processor, and there is no framework support for topologies. 
Output of a processing task always needs to go back to a message broker (e.g. 
Kafka).</p>
+
+<p>A positive consequence of Samza&rsquo;s design is that a job&rsquo;s output 
can be consumed by multiple unrelated jobs, potentially run by different teams, 
and those jobs are isolated from each other through Kafka&rsquo;s buffering. 
That is not the case with Storm&rsquo;s and Spark Streaming&rsquo;s 
framework-internal streams.</p>
+
+<p>Although a Storm/Spark Streaming job could in principle write its output to 
a message broker, the framework doesn&rsquo;t really make this easy. It seems 
that Storm/Spark aren&rsquo;t intended to used in a way where one 
topology&rsquo;s output is another topology&rsquo;s input. By contrast, in 
Samza, that mode of usage is standard.</p>
+
+<h3 id="maturity">Maturity</h3>
+
+<p>Spark has an active user and developer community, and recently releases 
1.0.0 version. It has a list of companies that use it on its <a 
href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark";>Powered
 by</a> page. Since Spark contains Spark Streaming, Spark SQL, MLlib, GraphX 
and Bagel, it&rsquo;s tough to tell what portion of companies on the list are 
actually using Spark Streaming, and not just Spark.</p>
+
+<p>Samza is still young, but has just released version 0.7.0. It has a 
responsive community and is being developed actively. That said, it is built on 
solid systems such as YARN and Kafka. Samza is heavily used at LinkedIn and we 
hope others will find it useful as well.</p>
+
+<h2 id="api-overview-&raquo;"><a href="../api/overview.html">API Overview 
&raquo;</a></h2>
+
+
+          </div>
+        </div>
+
+      </div><!-- /.wrapper-content -->
+    </div><!-- /.wrapper -->
+
+    <div class="footer">
+      <div class="container">
+        <!-- nothing for now. -->
+      </div>
+    </div>
+
+    <!-- Google Analytics -->
+    <script>
+      
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+      
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+      
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+      ga('create', 'UA-43122768-1', 'apache.org');
+      ga('send', 'pageview');
+
+    </script>
+  </body>
+</html>

Modified: incubator/samza/site/learn/documentation/0.7.0/comparisons/storm.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/comparisons/storm.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/comparisons/storm.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/comparisons/storm.html Thu 
Jul 24 05:05:00 2014
@@ -227,7 +227,7 @@
 
 <p>Samza&rsquo;s serialization and data model are both pluggable. We are not 
terribly opinionated about which approach is best.</p>
 
-<h2 id="api-overview-&raquo;"><a href="../api/overview.html">API Overview 
&raquo;</a></h2>
+<h2 id="spark-streaming-&raquo;"><a href="spark-streaming.html">Spark 
Streaming &raquo;</a></h2>
 
 
           </div>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/checkpointing.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/checkpointing.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/checkpointing.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/checkpointing.html 
Thu Jul 24 05:05:00 2014
@@ -145,7 +145,7 @@
 
 <p>For checkpoints to be effective, they need to be written somewhere where 
they will survive faults. Samza allows you to write checkpoints to the file 
system (using FileSystemCheckpointManager), but that doesn&rsquo;t help if the 
machine fails and the container needs to be restarted on another machine. The 
most common configuration is to use Kafka for checkpointing. You can enable 
this with the following job configuration:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># The name of your job determines the 
name under which checkpoints will be stored</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># The 
name of your job determines the name under which checkpoints will be 
stored</span>
 <span class="na">job.name</span><span class="o">=</span><span 
class="s">example-job</span>
 
 <span class="c"># Define a system called &quot;kafka&quot; for consuming and 
producing to a Kafka cluster</span>
@@ -162,7 +162,7 @@
 
 <p>Sometimes it can be useful to use checkpoints only for some input streams, 
but not for others. In this case, you can tell Samza to ignore any checkpointed 
offsets for a particular stream name:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Ignore any checkpoints for the topic 
&quot;my-special-topic&quot;</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Ignore 
any checkpoints for the topic &quot;my-special-topic&quot;</span>
 <span 
class="na">systems.kafka.streams.my-special-topic.samza.reset.offset</span><span
 class="o">=</span><span class="s">true</span>
 
 <span class="c"># Always start consuming &quot;my-special-topic&quot; at the 
oldest available offset</span>
@@ -208,12 +208,12 @@
 
 <p>To inspect a job&rsquo;s latest checkpoint, you need to specify your 
job&rsquo;s config file, so that the tool knows which job it is dealing 
with:</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
+<div class="highlight"><pre><code 
class="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
   --config-path<span 
class="o">=</span>file:///path/to/job/config.properties</code></pre></div>
 
 <p>This command prints out the latest checkpoint in a properties file format. 
You can save the output to a file, and edit it as you wish. For example, to 
jump back to the oldest possible point in time, you can set all the offsets to 
0. Then you can feed that properties file back into checkpoint-tool.sh and save 
the modified checkpoint:</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
+<div class="highlight"><pre><code 
class="bash">samza-example/target/bin/checkpoint-tool.sh <span 
class="se">\</span>
   --config-path<span class="o">=</span>file:///path/to/job/config.properties 
<span class="se">\</span>
   --new-offsets<span 
class="o">=</span>file:///path/to/new/offsets.properties</code></pre></div>
 

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/event-loop.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/event-loop.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/event-loop.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/event-loop.html 
Thu Jul 24 05:05:00 2014
@@ -153,7 +153,7 @@
 
 <p>You can then tell Samza to use your lifecycle listener with the following 
properties in your job configuration:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Define a listener called 
&quot;my-listener&quot; by giving the factory class name</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Define 
a listener called &quot;my-listener&quot; by giving the factory class 
name</span>
 <span class="na">task.lifecycle.listener.my-listener.class</span><span 
class="o">=</span><span class="s">com.example.foo.MyListenerFactory</span>
 
 <span class="c"># Enable it in this job (multiple listeners can be separated 
by commas)</span>

Modified: incubator/samza/site/learn/documentation/0.7.0/container/jmx.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/jmx.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/jmx.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/jmx.html Thu Jul 
24 05:05:00 2014
@@ -127,7 +127,7 @@
 
 <p>You can tell Samza to publish its internal <a 
href="metrics.html">metrics</a>, and any custom metrics you define, as JMX 
MBeans. To enable this, set the following properties in your job 
configuration:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Define a Samza metrics reporter 
called &quot;jmx&quot;, which publishes to JMX</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Define 
a Samza metrics reporter called &quot;jmx&quot;, which publishes to JMX</span>
 <span class="na">metrics.reporter.jmx.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.metrics.reporter.JmxReporterFactory</span>
 
 <span class="c"># Use it (if you have multiple reporters defined, separate 
them with commas)</span>

Modified: incubator/samza/site/learn/documentation/0.7.0/container/metrics.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/metrics.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/metrics.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/metrics.html Thu 
Jul 24 05:05:00 2014
@@ -129,7 +129,7 @@
 
 <p>To set up your job to publish metrics to Kafka, you can use the following 
configuration:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Define a metrics reporter called 
&quot;snapshot&quot;, which publishes metrics</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Define 
a metrics reporter called &quot;snapshot&quot;, which publishes metrics</span>
 <span class="c"># every 60 seconds.</span>
 <span class="na">metrics.reporters</span><span class="o">=</span><span 
class="s">snapshot</span>
 <span class="na">metrics.reporter.snapshot.class</span><span 
class="o">=</span><span 
class="s">org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory</span>
@@ -144,7 +144,7 @@
 
 <p>With this configuration, the job automatically sends several JSON-encoded 
messages to the &ldquo;metrics&rdquo; topic in Kafka every 60 seconds. The 
messages look something like this:</p>
 
-<div class="highlight"><pre><code class="language-json" data-lang="json"><span 
class="p">{</span>
+<div class="highlight"><pre><code class="json"><span class="p">{</span>
   <span class="nt">&quot;header&quot;</span><span class="p">:</span> <span 
class="p">{</span>
     <span class="nt">&quot;container-name&quot;</span><span class="p">:</span> 
<span class="s2">&quot;samza-container-0&quot;</span><span class="p">,</span>
     <span class="nt">&quot;host&quot;</span><span class="p">:</span> <span 
class="s2">&quot;samza-grid-1234.example.com&quot;</span><span 
class="p">,</span>
@@ -177,7 +177,7 @@
 
 <p>You can register your custom metrics through a <a 
href="../api/javadocs/org/apache/samza/metrics/MetricsRegistry.html">MetricsRegistry</a>.
 Your stream task needs to implement <a 
href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a>,
 so that you can get the metrics registry from the <a 
href="../api/javadocs/org/apache/samza/task/TaskContext.html">TaskContext</a>. 
This simple example shows how to count the number of messages processed by your 
task:</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">class</span> <span 
class="nc">MyJavaStreamTask</span> <span class="kd">implements</span> <span 
class="n">StreamTask</span><span class="o">,</span> <span 
class="n">InitableTask</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">class</span> <span class="nc">MyJavaStreamTask</span> <span 
class="kd">implements</span> <span class="n">StreamTask</span><span 
class="o">,</span> <span class="n">InitableTask</span> <span class="o">{</span>
   <span class="kd">private</span> <span class="n">Counter</span> <span 
class="n">messageCount</span><span class="o">;</span>
 
   <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">init</span><span class="o">(</span><span class="n">Config</span> 
<span class="n">config</span><span class="o">,</span> <span 
class="n">TaskContext</span> <span class="n">context</span><span 
class="o">)</span> <span class="o">{</span>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/samza-container.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/samza-container.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/container/samza-container.html 
(original)
+++ 
incubator/samza/site/learn/documentation/0.7.0/container/samza-container.html 
Thu Jul 24 05:05:00 2014
@@ -144,7 +144,7 @@
 
 <p>When the container starts, it creates instances of the <a 
href="../api/overview.html">task class</a> that you&rsquo;ve written. If the 
task class implements the <a 
href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a> 
interface, the SamzaContainer will also call the init() method.</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="cm">/** Implement this if you want a callback when your task starts up. 
*/</span>
+<div class="highlight"><pre><code class="java"><span class="cm">/** Implement 
this if you want a callback when your task starts up. */</span>
 <span class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">InitableTask</span> <span class="o">{</span>
   <span class="kt">void</span> <span class="nf">init</span><span 
class="o">(</span><span class="n">Config</span> <span 
class="n">config</span><span class="o">,</span> <span 
class="n">TaskContext</span> <span class="n">context</span><span 
class="o">);</span>
 <span class="o">}</span></code></pre></div>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/serialization.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/serialization.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/serialization.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/serialization.html 
Thu Jul 24 05:05:00 2014
@@ -133,7 +133,7 @@
 
 <p>You can use whatever makes sense for your job; Samza doesn&rsquo;t impose 
any particular data model or serialization scheme on you. However, the cleanest 
solution is usually to use Samza&rsquo;s serde layer. The following 
configuration example shows how to use it.</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Define a system called 
&quot;kafka&quot;</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Define 
a system called &quot;kafka&quot;</span>
 <span class="na">systems.kafka.samza.factory</span><span 
class="o">=</span><span 
class="s">org.apache.samza.system.kafka.KafkaSystemFactory</span>
 
 <span class="c"># The job is going to consume a topic called 
&quot;PageViewEvent&quot; from the &quot;kafka&quot; system</span>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/state-management.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/state-management.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/container/state-management.html 
(original)
+++ 
incubator/samza/site/learn/documentation/0.7.0/container/state-management.html 
Thu Jul 24 05:05:00 2014
@@ -246,7 +246,7 @@
 
 <p>To use a key-value store in your job, add the following to your job 
config:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Use the key-value store 
implementation for a store called &quot;my-store&quot;</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Use 
the key-value store implementation for a store called 
&quot;my-store&quot;</span>
 <span class="na">stores.my-store.factory</span><span class="o">=</span><span 
class="s">org.apache.samza.storage.kv.KeyValueStorageEngineFactory</span>
 
 <span class="c"># Use the Kafka topic &quot;my-store-changelog&quot; as the 
changelog stream for this store.</span>
@@ -263,7 +263,7 @@
 
 <p>Here is a simple example that writes every incoming message to the 
store:</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">class</span> <span 
class="nc">MyStatefulTask</span> <span class="kd">implements</span> <span 
class="n">StreamTask</span><span class="o">,</span> <span 
class="n">InitableTask</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">class</span> <span class="nc">MyStatefulTask</span> <span 
class="kd">implements</span> <span class="n">StreamTask</span><span 
class="o">,</span> <span class="n">InitableTask</span> <span class="o">{</span>
   <span class="kd">private</span> <span class="n">KeyValueStore</span><span 
class="o">&lt;</span><span class="n">String</span><span class="o">,</span> 
<span class="n">String</span><span class="o">&gt;</span> <span 
class="n">store</span><span class="o">;</span>
 
   <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">init</span><span class="o">(</span><span class="n">Config</span> 
<span class="n">config</span><span class="o">,</span> <span 
class="n">TaskContext</span> <span class="n">context</span><span 
class="o">)</span> <span class="o">{</span>
@@ -279,13 +279,13 @@
 
 <p>Here is the complete key-value store API:</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">KeyValueStore</span><span class="o">&lt;</span><span 
class="n">K</span><span class="o">,</span> <span class="n">V</span><span 
class="o">&gt;</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">interface</span> <span class="nc">KeyValueStore</span><span 
class="o">&lt;</span><span class="n">K</span><span class="o">,</span> <span 
class="n">V</span><span class="o">&gt;</span> <span class="o">{</span>
   <span class="n">V</span> <span class="nf">get</span><span 
class="o">(</span><span class="n">K</span> <span class="n">key</span><span 
class="o">);</span>
   <span class="kt">void</span> <span class="nf">put</span><span 
class="o">(</span><span class="n">K</span> <span class="n">key</span><span 
class="o">,</span> <span class="n">V</span> <span class="n">value</span><span 
class="o">);</span>
   <span class="kt">void</span> <span class="nf">putAll</span><span 
class="o">(</span><span class="n">List</span><span class="o">&lt;</span><span 
class="n">Entry</span><span class="o">&lt;</span><span class="n">K</span><span 
class="o">,</span><span class="n">V</span><span class="o">&gt;&gt;</span> <span 
class="n">entries</span><span class="o">);</span>
   <span class="kt">void</span> <span class="nf">delete</span><span 
class="o">(</span><span class="n">K</span> <span class="n">key</span><span 
class="o">);</span>
-  <span class="n">KeyValueIterator</span><span class="o">&lt;</span><span 
class="n">K</span><span class="o">,</span><span class="n">V</span><span 
class="o">&gt;</span> <span class="nf">range</span><span 
class="o">(</span><span class="n">K</span> <span class="n">from</span><span 
class="o">,</span> <span class="n">K</span> <span class="n">to</span><span 
class="o">);</span>
-  <span class="n">KeyValueIterator</span><span class="o">&lt;</span><span 
class="n">K</span><span class="o">,</span><span class="n">V</span><span 
class="o">&gt;</span> <span class="nf">all</span><span class="o">();</span>
+  <span class="n">KeyValueIterator</span><span class="o">&lt;</span><span 
class="n">K</span><span class="o">,</span><span class="n">V</span><span 
class="o">&gt;</span> <span class="n">range</span><span class="o">(</span><span 
class="n">K</span> <span class="n">from</span><span class="o">,</span> <span 
class="n">K</span> <span class="n">to</span><span class="o">);</span>
+  <span class="n">KeyValueIterator</span><span class="o">&lt;</span><span 
class="n">K</span><span class="o">,</span><span class="n">V</span><span 
class="o">&gt;</span> <span class="n">all</span><span class="o">();</span>
 <span class="o">}</span></code></pre></div>
 
 <p>Additional configuration properties for the key-value store are documented 
in the <a href="../jobs/configuration-table.html#keyvalue">configuration 
reference</a>.</p>

Modified: incubator/samza/site/learn/documentation/0.7.0/container/streams.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/streams.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/streams.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/streams.html Thu 
Jul 24 05:05:00 2014
@@ -125,7 +125,7 @@
 
 <p>The <a href="samza-container.html">samza container</a> reads and writes 
messages using the <a 
href="../api/javadocs/org/apache/samza/system/SystemConsumer.html">SystemConsumer</a>
 and <a 
href="../api/javadocs/org/apache/samza/system/SystemProducer.html">SystemProducer</a>
 interfaces. You can integrate any message broker with Samza by implementing 
these two interfaces.</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">SystemConsumer</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">interface</span> <span class="nc">SystemConsumer</span> <span 
class="o">{</span>
   <span class="kt">void</span> <span class="nf">start</span><span 
class="o">();</span>
 
   <span class="kt">void</span> <span class="nf">stop</span><span 
class="o">();</span>
@@ -185,7 +185,7 @@
 
 <p>To plug in your own message chooser, you need to implement the <a 
href="../api/javadocs/org/apache/samza/system/chooser/MessageChooserFactory.html">MessageChooserFactory</a>
 interface, and set the &ldquo;task.chooser.class&rdquo; configuration to the 
fully-qualified class name of your implementation:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">task.chooser.class</span><span 
class="o">=</span><span 
class="s">com.example.samza.YourMessageChooserFactory</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">task.chooser.class</span><span class="o">=</span><span 
class="s">com.example.samza.YourMessageChooserFactory</span></code></pre></div>
 
 <h4 id="prioritizing-input-streams">Prioritizing input streams</h4>
 
@@ -193,7 +193,7 @@
 
 <p>Samza provides a mechanism to prioritize one stream over another by setting 
this configuration parameter: 
systems.&lt;system&gt;.streams.&lt;stream&gt;.samza.priority=&lt;number&gt;. 
For example:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span 
class="na">systems.kafka.streams.my-real-time-stream.samza.priority</span><span 
class="o">=</span><span class="s">2</span>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">systems.kafka.streams.my-real-time-stream.samza.priority</span><span 
class="o">=</span><span class="s">2</span>
 <span 
class="na">systems.kafka.streams.my-batch-stream.samza.priority</span><span 
class="o">=</span><span class="s">1</span></code></pre></div>
 
 <p>This declares that my-real-time-stream&rsquo;s messages should be processed 
with higher priority than my-batch-stream&rsquo;s messages. If 
my-real-time-stream has any messages available, they are processed first. Only 
if there are no messages currently waiting on my-real-time-stream, the Samza 
job continues processing my-batch-stream.</p>
@@ -212,7 +212,7 @@
 
 <p>To configure a stream called &ldquo;my-bootstrap-stream&rdquo; to be a 
fully-consumed bootstrap stream, use the following settings:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span 
class="na">systems.kafka.streams.my-bootstrap-stream.samza.bootstrap</span><span
 class="o">=</span><span class="s">true</span>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">systems.kafka.streams.my-bootstrap-stream.samza.bootstrap</span><span
 class="o">=</span><span class="s">true</span>
 <span 
class="na">systems.kafka.streams.my-bootstrap-stream.samza.reset.offset</span><span
 class="o">=</span><span class="s">true</span>
 <span 
class="na">systems.kafka.streams.my-bootstrap-stream.samza.offset.default</span><span
 class="o">=</span><span class="s">oldest</span></code></pre></div>
 
@@ -226,7 +226,7 @@
 
 <p>For example, if you want to read 100 messages in a row from each stream 
partition (regardless of the MessageChooser), you can use this configuration 
parameter:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">task.consumer.batch.size</span><span 
class="o">=</span><span class="s">100</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">task.consumer.batch.size</span><span class="o">=</span><span 
class="s">100</span></code></pre></div>
 
 <p>With this setting, Samza tries to read a message from the most recently 
used <a 
href="../api/javadocs/org/apache/samza/system/SystemStreamPartition.html">SystemStreamPartition</a>.
 This behavior continues either until no more messages are available for that 
SystemStreamPartition, or until the batch size has been reached. When that 
happens, Samza defers to the MessageChooser to determine the next message to 
process. It then again tries to continue consume from the chosen 
message&rsquo;s SystemStreamPartition until the batch size is reached.</p>
 

Modified: 
incubator/samza/site/learn/documentation/0.7.0/container/windowing.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/container/windowing.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/container/windowing.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/container/windowing.html Thu 
Jul 24 05:05:00 2014
@@ -127,14 +127,14 @@
 
 <p>Samza&rsquo;s <em>windowing</em> feature provides a way for tasks to do 
something in regular time intervals, for example once per minute. To enable 
windowing, you just need to set one property in your job configuration:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Call the window() method every 60 
seconds</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># Call 
the window() method every 60 seconds</span>
 <span class="na">task.window.ms</span><span class="o">=</span><span 
class="s">60000</span></code></pre></div>
 
 <p>Next, your stream task needs to implement the <a 
href="../api/javadocs/org/apache/samza/task/WindowableTask.html">WindowableTask</a>
 interface. This interface defines a window() method which is called by Samza 
in the regular interval that you configured.</p>
 
 <p>For example, this is how you would implement a basic per-minute event 
counter:</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">class</span> <span 
class="nc">EventCounterTask</span> <span class="kd">implements</span> <span 
class="n">StreamTask</span><span class="o">,</span> <span 
class="n">WindowableTask</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">class</span> <span class="nc">EventCounterTask</span> <span 
class="kd">implements</span> <span class="n">StreamTask</span><span 
class="o">,</span> <span class="n">WindowableTask</span> <span 
class="o">{</span>
 
   <span class="kd">public</span> <span class="kd">static</span> <span 
class="kd">final</span> <span class="n">SystemStream</span> <span 
class="n">OUTPUT_STREAM</span> <span class="o">=</span>
     <span class="k">new</span> <span class="nf">SystemStream</span><span 
class="o">(</span><span class="s">&quot;kafka&quot;</span><span 
class="o">,</span> <span class="s">&quot;events-per-minute&quot;</span><span 
class="o">);</span>
@@ -149,7 +149,7 @@
 
   <span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">window</span><span class="o">(</span><span 
class="n">MessageCollector</span> <span class="n">collector</span><span 
class="o">,</span>
                      <span class="n">TaskCoordinator</span> <span 
class="n">coordinator</span><span class="o">)</span> <span class="o">{</span>
-    <span class="n">collector</span><span class="o">.</span><span 
class="na">send</span><span class="o">(</span><span class="k">new</span> <span 
class="nf">OutgoingMessageEnvelope</span><span class="o">(</span><span 
class="n">OUTPUT_STREAM</span><span class="o">,</span> <span 
class="n">eventsSeen</span><span class="o">));</span>
+    <span class="n">collector</span><span class="o">.</span><span 
class="na">send</span><span class="o">(</span><span class="k">new</span> <span 
class="n">OutgoingMessageEnvelope</span><span class="o">(</span><span 
class="n">OUTPUT_STREAM</span><span class="o">,</span> <span 
class="n">eventsSeen</span><span class="o">));</span>
     <span class="n">eventsSeen</span> <span class="o">=</span> <span 
class="mi">0</span><span class="o">;</span>
   <span class="o">}</span>
 <span class="o">}</span></code></pre></div>

Modified: incubator/samza/site/learn/documentation/0.7.0/index.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/index.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/index.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/index.html Thu Jul 24 
05:05:00 2014
@@ -137,6 +137,7 @@
   <li><a href="comparisons/introduction.html">Introduction</a></li>
   <li><a href="comparisons/mupd8.html">MUPD8</a></li>
   <li><a href="comparisons/storm.html">Storm</a></li>
+  <li><a href="comparisons/spark-streaming.html">Spark Streaming</a></li>
 <!-- TODO comparisons pages
   <li><a href="comparisons/aurora.html">Aurora</a></li>
   <li><a href="comparisons/jms.html">JMS</a></li>

Modified: 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html 
(original)
+++ 
incubator/samza/site/learn/documentation/0.7.0/introduction/architecture.html 
Thu Jul 24 05:05:00 2014
@@ -203,7 +203,7 @@
 
 <p>Let&rsquo;s take a look at a real example: suppose we want to count the 
number of page views. In SQL, you would write something like:</p>
 
-<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span 
class="k">SELECT</span> <span class="n">user_id</span><span class="p">,</span> 
<span class="k">COUNT</span><span class="p">(</span><span 
class="o">*</span><span class="p">)</span> <span class="k">FROM</span> <span 
class="n">PageViewEvent</span> <span class="k">GROUP</span> <span 
class="k">BY</span> <span class="n">user_id</span></code></pre></div>
+<div class="highlight"><pre><code class="sql"><span class="k">SELECT</span> 
<span class="n">user_id</span><span class="p">,</span> <span 
class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span 
class="p">)</span> <span class="k">FROM</span> <span 
class="n">PageViewEvent</span> <span class="k">GROUP</span> <span 
class="k">BY</span> <span class="n">user_id</span></code></pre></div>
 
 <p>Although Samza doesn&rsquo;t support SQL right now, the idea is the same. 
Two jobs are required to calculate this query: one to group messages by user 
ID, and the other to do the counting.</p>
 

Modified: 
incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- 
incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html 
(original)
+++ 
incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html 
Thu Jul 24 05:05:00 2014
@@ -357,6 +357,26 @@
                 </tr>
 
                 <tr>
+                    <td class="property" 
id="task-drop-deserialization-errors">task.drop.deserialization.errors</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        This property is to define how the system deals with 
deserialization failure situation. If set to true, the system will
+                        skip the error messages and keep running. If set to 
false, the system with throw exceptions and fail the container. Default 
+                        is false.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" 
id="task-drop-serialization-errors">task.drop.serialization.errors</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        This property is to define how the system deals with 
serialization failure situation. If set to true, the system will
+                        drop the error messages and keep running. If set to 
false, the system with throw exceptions and fail the container. Default 
+                        is false.
+                    </td>
+                </tr>
+
+                <tr>
                     <th colspan="3" class="section" id="streams"><a 
href="../container/streams.html">Systems (input and output streams)</a></th>
                 </tr>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html Thu 
Jul 24 05:05:00 2014
@@ -125,7 +125,7 @@
 
 <p>All Samza jobs have a configuration file that defines the job. A very basic 
configuration file looks like this:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="c"># Job</span>
+<div class="highlight"><pre><code class="jproperties"><span class="c"># 
Job</span>
 <span class="na">job.factory.class</span><span class="o">=</span><span 
class="s">samza.job.local.LocalJobFactory</span>
 <span class="na">job.name</span><span class="o">=</span><span 
class="s">hello-world</span>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/job-runner.html Thu Jul 
24 05:05:00 2014
@@ -125,13 +125,13 @@
 
 <p>Samza jobs are started using a script called run-job.sh.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">samza-example/target/bin/run-job.sh <span class="se">\</span>
+<div class="highlight"><pre><code 
class="bash">samza-example/target/bin/run-job.sh <span class="se">\</span>
   --config-factory<span 
class="o">=</span>samza.config.factories.PropertiesConfigFactory <span 
class="se">\</span>
   --config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/config/hello-world.properties</code></pre></div>
 
 <p>You provide two parameters to the run-job.sh script. One is the config 
location, and the other is a factory class that is used to read your 
configuration file. The run-job.sh script is actually executing a Samza class 
called JobRunner. The JobRunner uses your ConfigFactory to get a Config object 
from the config path.</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">ConfigFactory</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">interface</span> <span class="nc">ConfigFactory</span> <span 
class="o">{</span>
   <span class="n">Config</span> <span class="nf">getConfig</span><span 
class="o">(</span><span class="n">URI</span> <span 
class="n">configUri</span><span class="o">);</span>
 <span class="o">}</span></code></pre></div>
 
@@ -139,7 +139,7 @@
 
 <p>Once the JobRunner gets your configuration, it gives your configuration to 
the StreamJobFactory class defined by the &ldquo;job.factory&rdquo; property. 
Samza ships with two job factory implementations: LocalJobFactory and 
YarnJobFactory. The StreamJobFactory&rsquo;s responsibility is to give the 
JobRunner a job that it can run.</p>
 
-<div class="highlight"><pre><code class="language-java" data-lang="java"><span 
class="kd">public</span> <span class="kd">interface</span> <span 
class="nc">StreamJob</span> <span class="o">{</span>
+<div class="highlight"><pre><code class="java"><span class="kd">public</span> 
<span class="kd">interface</span> <span class="nc">StreamJob</span> <span 
class="o">{</span>
   <span class="n">StreamJob</span> <span class="nf">submit</span><span 
class="o">();</span>
 
   <span class="n">StreamJob</span> <span class="nf">kill</span><span 
class="o">();</span>

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/logging.html Thu Jul 24 
05:05:00 2014
@@ -129,7 +129,7 @@
 
 <p>The <a href="/startup/hello-samza/0.7.0">hello-samza</a> project shows how 
to use <a href="http://logging.apache.org/log4j/1.2/";>log4j</a> with Samza. To 
turn on log4j logging, you just need to make sure slf4j-log4j12 is in your 
SamzaContainer&rsquo;s classpath. In Maven, this can be done by adding the 
following dependency to your Samza package project.</p>
 
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span 
class="nt">&lt;dependency&gt;</span>
+<div class="highlight"><pre><code class="xml"><span 
class="nt">&lt;dependency&gt;</span>
   <span class="nt">&lt;groupId&gt;</span>org.slf4j<span 
class="nt">&lt;/groupId&gt;</span>
   <span class="nt">&lt;artifactId&gt;</span>slf4j-log4j12<span 
class="nt">&lt;/artifactId&gt;</span>
   <span class="nt">&lt;scope&gt;</span>runtime<span 
class="nt">&lt;/scope&gt;</span>
@@ -142,15 +142,15 @@
 
 <p>Samza&rsquo;s <a href="packaging.html">run-class.sh</a> script will 
automatically set the following setting if log4j.xml exists in your <a 
href="packaging.html">Samza package&rsquo;s</a> lib directory.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">-Dlog4j.configuration<span class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j.xml</code></pre></div>
+<div class="highlight"><pre><code class="bash">-Dlog4j.configuration<span 
class="o">=</span>file:<span 
class="nv">$base_dir</span>/lib/log4j.xml</code></pre></div>
 
 <p>The <a href="packaging.html">run-class.sh</a> script will also set the 
following Java system properties:</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">-Dsamza.log.dir<span class="o">=</span><span 
class="nv">$SAMZA_LOG_DIR</span> -Dsamza.container.name<span 
class="o">=</span><span class="nv">$SAMZA_CONTAINER_NAME</span><span 
class="o">=</span></code></pre></div>
+<div class="highlight"><pre><code class="bash">-Dsamza.log.dir<span 
class="o">=</span><span class="nv">$SAMZA_LOG_DIR</span> 
-Dsamza.container.name<span class="o">=</span><span 
class="nv">$SAMZA_CONTAINER_NAME</span><span 
class="o">=</span></code></pre></div>
 
 <p>These settings are very useful if you&rsquo;re using a file-based appender. 
For example, you can use a daily rolling appender by configuring log4j.xml like 
this:</p>
 
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span 
class="nt">&lt;appender</span> <span class="na">name=</span><span 
class="s">&quot;RollingAppender&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.log4j.DailyRollingFileAppender&quot;</span><span 
class="nt">&gt;</span>
+<div class="highlight"><pre><code class="xml"><span 
class="nt">&lt;appender</span> <span class="na">name=</span><span 
class="s">&quot;RollingAppender&quot;</span> <span 
class="na">class=</span><span 
class="s">&quot;org.apache.log4j.DailyRollingFileAppender&quot;</span><span 
class="nt">&gt;</span>
    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;File&quot;</span> <span class="na">value=</span><span 
class="s">&quot;${samza.log.dir}/${samza.container.name}.log&quot;</span> <span 
class="nt">/&gt;</span>
    <span class="nt">&lt;param</span> <span class="na">name=</span><span 
class="s">&quot;DatePattern&quot;</span> <span class="na">value=</span><span 
class="s">&quot;&#39;.&#39;yyyy-MM-dd&quot;</span> <span class="nt">/&gt;</span>
    <span class="nt">&lt;layout</span> <span class="na">class=</span><span 
class="s">&quot;org.apache.log4j.PatternLayout&quot;</span><span 
class="nt">&gt;</span>
@@ -170,7 +170,7 @@
 
 <p>Samza&rsquo;s will automatically set the following garbage collection 
logging setting, and will output it to <code>$SAMZA_LOG_DIR/gc.log</code>.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">-XX:+PrintGCDateStamps -Xloggc:<span 
class="nv">$SAMZA_LOG_DIR</span>/gc.log</code></pre></div>
+<div class="highlight"><pre><code class="bash">-XX:+PrintGCDateStamps 
-Xloggc:<span class="nv">$SAMZA_LOG_DIR</span>/gc.log</code></pre></div>
 
 <h4 id="rotation">Rotation</h4>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/reprocessing.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/reprocessing.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/reprocessing.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/reprocessing.html Thu 
Jul 24 05:05:00 2014
@@ -106,6 +106,23 @@
 
 <h2>Reprocessing previously processed data</h2>
 
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
 <p>From time to time you may want to deploy a new version of your Samza job 
that computes results differently. Perhaps you fixed a bug or introduced a new 
feature. For example, say you have a Samza job that classifies messages as spam 
or not-spam, using a machine learning model that you train offline. 
Periodically you want to deploy an updated version of your Samza job which 
includes the latest classification model.</p>
 
 <p>When you start up a new version of your job, a question arises: what do you 
want to do with messages that were previously processed with the old version of 
your job? The answer depends on the behavior you want:</p>

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/yarn-jobs.html Thu Jul 
24 05:05:00 2014
@@ -127,7 +127,7 @@
 
 <p>If you want to use YARN to run your Samza job, you&rsquo;ll also need to 
define the location of your Samza job&rsquo;s package. For example, you might 
say:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">yarn.package.path</span><span 
class="o">=</span><span 
class="s">http://my.http.server/jobs/ingraphs-package-0.0.55.tgz</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">yarn.package.path</span><span class="o">=</span><span 
class="s">http://my.http.server/jobs/ingraphs-package-0.0.55.tgz</span></code></pre></div>
 
 <p>This .tgz file follows the conventions outlined on the <a 
href="packaging.html">Packaging</a> page (it has bin/run-am.sh and 
bin/run-container.sh). YARN NodeManagers will take responsibility for 
downloading this .tgz file on the appropriate machines, and untar&#39;ing them. 
From there, YARN will execute run-am.sh or run-container.sh for the Samza 
Application Master, and SamzaContainer, respectively.</p>
 

Modified: incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html 
(original)
+++ incubator/samza/site/learn/documentation/0.7.0/operations/kafka.html Thu 
Jul 24 05:05:00 2014
@@ -133,7 +133,7 @@
 
 <p>Kafka brokers should be configured to automatically create topics. Without 
this, it&rsquo;s going to be very cumbersome to run Samze jobs, since jobs will 
write to arbitrary (and sometimes new) topics.</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">auto.create.topics.enable</span><span 
class="o">=</span><span class="s">true</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">auto.create.topics.enable</span><span class="o">=</span><span 
class="s">true</span></code></pre></div>
 
 
           </div>

Modified: 
incubator/samza/site/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html 
(original)
+++ incubator/samza/site/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html 
Thu Jul 24 05:05:00 2014
@@ -125,49 +125,19 @@
 
 <p>This tutorial uses <a 
href="../../../startup/hello-samza/0.7.0/">hello-samza</a> to illustrate how to 
run a Samza job if you want to publish the Samza job&rsquo;s .tar.gz package to 
HDFS.</p>
 
-<h3 id="build-a-new-samza-job-package">Build a new Samza job package</h3>
-
-<p>Build a new Samza job package to include the hadoop-hdfs-version.jar.</p>
-
-<ul>
-<li>Add dependency statement in pom.xml of samza-job-package</li>
-</ul>
-
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span 
class="nt">&lt;dependency&gt;</span>
-  <span class="nt">&lt;groupId&gt;</span>org.apache.hadoop<span 
class="nt">&lt;/groupId&gt;</span>
-  <span class="nt">&lt;artifactId&gt;</span>hadoop-hdfs<span 
class="nt">&lt;/artifactId&gt;</span>
-  <span class="nt">&lt;version&gt;</span>2.2.0<span 
class="nt">&lt;/version&gt;</span>
-<span class="nt">&lt;/dependency&gt;</span></code></pre></div>
-
-<ul>
-<li>Add the following code to src/main/assembly/src.xml in 
samza-job-package.</li>
-</ul>
-
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span 
class="nt">&lt;include&gt;</span>org.apache.hadoop:hadoop-hdfs<span 
class="nt">&lt;/include&gt;</span></code></pre></div>
-
-<ul>
-<li>Create .tar.gz package</li>
-</ul>
-
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">mvn 
clean pacakge</code></pre></div>
-
-<ul>
-<li>Make sure hadoop-common-version.jar has the same version as your 
hadoop-hdfs-version.jar. Otherwise, you may still have errors.</li>
-</ul>
-
 <h3 id="upload-the-package">Upload the package</h3>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">hadoop fs -put 
./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz 
/path/for/tgz</code></pre></div>
+<div class="highlight"><pre><code class="bash">hadoop fs -put 
./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz 
/path/for/tgz</code></pre></div>
 
 <h3 id="add-hdfs-configuration">Add HDFS configuration</h3>
 
-<p>Put the hdfs-site.xml file of your cluster into ~/.samza/conf directory. 
(The same place as the yarn-site.xml)</p>
+<p>Put the hdfs-site.xml file of your cluster into ~/.samza/conf directory 
(The same place as the yarn-site.xml). If you set HADOOP_CONF_DIR, put the 
hdfs-site.xml in your configuration directory if the hdfs-site.xml is not 
there.</p>
 
 <h3 id="change-properties-file">Change properties file</h3>
 
 <p>Change the yarn.package.path in the properties file to your HDFS 
location.</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">yarn.package.path</span><span 
class="o">=</span><span class="s">hdfs://&lt;hdfs name node ip&gt;:&lt;hdfs 
name node port&gt;/path/to/tgz</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">yarn.package.path</span><span class="o">=</span><span 
class="s">hdfs://&lt;hdfs name node ip&gt;:&lt;hdfs name node 
port&gt;/path/to/tgz</span></code></pre></div>
 
 <p>Then you should be able to run the Samza job as described in <a 
href="../../../startup/hello-samza/0.7.0/">hello-samza</a>.</p>
 

Modified: incubator/samza/site/learn/tutorials/0.7.0/remote-debugging-samza.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/tutorials/0.7.0/remote-debugging-samza.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- incubator/samza/site/learn/tutorials/0.7.0/remote-debugging-samza.html 
(original)
+++ incubator/samza/site/learn/tutorials/0.7.0/remote-debugging-samza.html Thu 
Jul 24 05:05:00 2014
@@ -129,22 +129,22 @@
 
 <p>Start by checking out Samza, so we have access to the source.</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git</code></pre></div>
+<div class="highlight"><pre><code class="bash">git clone 
http://git-wip-us.apache.org/repos/asf/incubator-samza.git</code></pre></div>
 
 <p>Next, grab hello-samza.</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">git 
clone git://git.apache.org/incubator-samza-hello-samza.git</code></pre></div>
+<div class="highlight"><pre><code class="bash">git clone 
git://git.apache.org/incubator-samza-hello-samza.git</code></pre></div>
 
 <h3 id="setup-the-environment">Setup the Environment</h3>
 
 <p>Now, let&rsquo;s setup the Eclipse project files.</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span 
class="nb">cd </span>incubator-samza
+<div class="highlight"><pre><code class="bash"><span class="nb">cd 
</span>incubator-samza
 ./gradlew eclipse</code></pre></div>
 
 <p>Let&rsquo;s also release Samza to Maven&rsquo;s local repository, so 
hello-samza has access to the JARs that it needs.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">./gradlew -PscalaVersion<span class="o">=</span>2.9.2 clean 
publishToMavenLocal</code></pre></div>
+<div class="highlight"><pre><code class="bash">./gradlew -PscalaVersion<span 
class="o">=</span>2.9.2 clean publishToMavenLocal</code></pre></div>
 
 <p>Next, open Eclipse, and import the Samza source code into your workspace: 
&ldquo;File&rdquo; &gt; &ldquo;Import&rdquo; &gt; &ldquo;Existing Projects into 
Workspace&rdquo; &gt; &ldquo;Browse&rdquo;. Select 
&lsquo;incubator-samza&rsquo; folder, and hit &lsquo;finish&rsquo;.</p>
 
@@ -152,7 +152,7 @@
 
 <p>Now, go back to the hello-samza project, and edit 
./samza-job-package/src/main/config/wikipedia-feed.properties to add the 
following line:</p>
 
-<div class="highlight"><pre><code class="language-jproperties" 
data-lang="jproperties"><span class="na">task.opts</span><span 
class="o">=</span><span 
class="s">-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=y</span></code></pre></div>
+<div class="highlight"><pre><code class="jproperties"><span 
class="na">task.opts</span><span class="o">=</span><span 
class="s">-agentlib:jdwp=transport=dt_socket,address=localhost:9009,server=y,suspend=y</span></code></pre></div>
 
 <p>The <a 
href="../../documentation/0.7.0/jobs/configuration-table.html">task.opts</a> 
configuration parameter is a way to override Java parameters at runtime for 
your Samza containers. In this example, we&rsquo;re setting the agentlib 
parameter to enable remote debugging on localhost, port 9009. In a more 
realistic environment, you might also set Java heap settings (-Xmx, -Xms, etc), 
as well as garbage collection and logging settings.</p>
 
@@ -162,15 +162,15 @@
 
 <p>Now that the Samza job has been setup to enable remote debugging when a 
Samza container starts, let&rsquo;s start the ZooKeeper, Kafka, and YARN.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/grid</code></pre></div>
+<div class="highlight"><pre><code class="bash">bin/grid</code></pre></div>
 
 <p>If you get a complaint that JAVA_HOME is not set, then you&rsquo;ll need to 
set it. This can be done on OSX by running:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span 
class="nb">export </span><span class="nv">JAVA_HOME</span><span 
class="o">=</span><span class="k">$(</span>/usr/libexec/java_home<span 
class="k">)</span></code></pre></div>
+<div class="highlight"><pre><code class="bash"><span class="nb">export 
</span><span class="nv">JAVA_HOME</span><span class="o">=</span><span 
class="k">$(</span>/usr/libexec/java_home<span 
class="k">)</span></code></pre></div>
 
 <p>Once the grid starts, you can start the wikipedia-feed Samza job.</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">mvn 
clean package
+<div class="highlight"><pre><code class="bash">mvn clean package
 mkdir -p deploy/samza
 tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C 
deploy/samza
 deploy/samza/bin/run-job.sh --config-factory<span 
class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div>

Modified: 
incubator/samza/site/learn/tutorials/0.7.0/run-hello-samza-without-internet.html
URL: 
http://svn.apache.org/viewvc/incubator/samza/site/learn/tutorials/0.7.0/run-hello-samza-without-internet.html?rev=1612998&r1=1612997&r2=1612998&view=diff
==============================================================================
--- 
incubator/samza/site/learn/tutorials/0.7.0/run-hello-samza-without-internet.html
 (original)
+++ 
incubator/samza/site/learn/tutorials/0.7.0/run-hello-samza-without-internet.html
 Thu Jul 24 05:05:00 2014
@@ -129,7 +129,7 @@
 
 <p>Ping irc.wikimedia.org. Sometimes the firewall in your company blocks this 
service.</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">telnet irc.wikimedia.org 6667</code></pre></div>
+<div class="highlight"><pre><code class="bash">telnet irc.wikimedia.org 
6667</code></pre></div>
 
 <p>You should see something like this:</p>
 <div class="highlight"><pre><code class="language-text" 
data-lang="text">Trying 208.80.152.178...
@@ -146,15 +146,15 @@ NOTICE AUTH :*** Found your hostname
 
 <p>We provide an alternative to get wikipedia feed data. Instead of running</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">deploy/samza/bin/run-job.sh --config-factory<span 
class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div>
+<div class="highlight"><pre><code class="bash">deploy/samza/bin/run-job.sh 
--config-factory<span 
class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div>
 
 <p>You will run</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/produce-wikipedia-raw-data.sh</code></pre></div>
+<div class="highlight"><pre><code 
class="bash">bin/produce-wikipedia-raw-data.sh</code></pre></div>
 
 <p>This script will read wikipedia feed data from local file and produce them 
to the Kafka broker. By default, it produces to localhost:9092 as the Kafka 
broker and uses localhost:2181 as zookeeper. You can overwrite them:</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">bin/produce-wikipedia-raw-data.sh -b yourKafkaBrokerAddress -z 
yourZookeeperAddress</code></pre></div>
+<div class="highlight"><pre><code 
class="bash">bin/produce-wikipedia-raw-data.sh -b yourKafkaBrokerAddress -z 
yourZookeeperAddress</code></pre></div>
 
 <p>Now you can go back to Generate Wikipedia Statistics section in <a 
href="../../../startup/hello-samza/0.7.0/">Hello Samza</a> and follow the 
remaining steps.</p>
 
@@ -162,7 +162,7 @@ NOTICE AUTH :*** Found your hostname
 
 <p>The goal of</p>
 
-<div class="highlight"><pre><code class="language-bash" 
data-lang="bash">deploy/samza/bin/run-job.sh --config-factory<span 
class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div>
+<div class="highlight"><pre><code class="bash">deploy/samza/bin/run-job.sh 
--config-factory<span 
class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path<span class="o">=</span>file://<span 
class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div>
 
 <p>is to deploy a Samza job which listens to wikipedia API, receives the feed 
in realtime and produces the feed to the Kafka topic wikipedia-raw. The 
alternative in this tutorial is reading local wikipedia feed in an infinite 
loop and producing the data to Kafka wikipedia-raw. The follow-up job, 
wikipedia-parser is getting data from Kafka topic wikipedia-raw, so as long as 
we have correct data in Kafka topic wikipedia-raw, we are fine. All Samza jobs 
are connected by the Kafka and do not depend on each other.</p>
 


Reply via email to