Modified: incubator/samza/site/learn/tutorials/0.7.0/run-in-multi-node-yarn.html URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/tutorials/0.7.0/run-in-multi-node-yarn.html?rev=1612998&r1=1612997&r2=1612998&view=diff ============================================================================== --- incubator/samza/site/learn/tutorials/0.7.0/run-in-multi-node-yarn.html (original) +++ incubator/samza/site/learn/tutorials/0.7.0/run-in-multi-node-yarn.html Thu Jul 24 05:05:00 2014 @@ -133,24 +133,24 @@ <p>1. Dowload <a href="http://mirror.symnds.com/software/Apache/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz">YARN 2.3</a> to /tmp and untar it.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd</span> /tmp +<div class="highlight"><pre><code class="bash"><span class="nb">cd</span> /tmp tar -xvf hadoop-2.3.0.tar.gz <span class="nb">cd </span>hadoop-2.3.0</code></pre></div> <p>2. Set up environment variables.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">HADOOP_YARN_HOME</span><span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span> +<div class="highlight"><pre><code class="bash"><span class="nb">export </span><span class="nv">HADOOP_YARN_HOME</span><span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span> mkdir conf <span class="nb">export </span><span class="nv">HADOOP_CONF_DIR</span><span class="o">=</span><span class="nv">$HADOOP_YARN_HOME</span>/conf</code></pre></div> <p>3. Configure YARN setting file.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">cp ./etc/hadoop/yarn-site.xml conf +<div class="highlight"><pre><code class="bash">cp ./etc/hadoop/yarn-site.xml conf vi conf/yarn-site.xml</code></pre></div> <p>Add the following property to yarn-site.xml:</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt"><property></span> +<div class="highlight"><pre><code class="xml"><span class="nt"><property></span> <span class="nt"><name></span>yarn.resourcemanager.hostname<span class="nt"></name></span> <span class="c"><!-- hostname that is accessible from all NMs --></span> <span class="nt"><value></span>yourHostname<span class="nt"></value></span> @@ -165,23 +165,23 @@ vi conf/yarn-site.xml</code></pre></div> <p>4. Download Scala package and untar it.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd</span> /tmp +<div class="highlight"><pre><code class="bash"><span class="nb">cd</span> /tmp curl http://www.scala-lang.org/files/archive/scala-2.10.3.tgz > scala-2.10.3.tgz tar -xvf scala-2.10.3.tgz</code></pre></div> <p>5. Add Scala and its log jars.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">cp /tmp/scala-2.10.3/lib/scala-compiler.jar <span class="nv">$HADOOP_YARN_HOME</span>/share/hadoop/hdfs/lib +<div class="highlight"><pre><code class="bash">cp /tmp/scala-2.10.3/lib/scala-compiler.jar <span class="nv">$HADOOP_YARN_HOME</span>/share/hadoop/hdfs/lib cp /tmp/scala-2.10.3/lib/scala-library.jar <span class="nv">$HADOOP_YARN_HOME</span>/share/hadoop/hdfs/lib curl http://search.maven.org/remotecontent?filepath<span class="o">=</span>org/clapper/grizzled-slf4j_2.10/1.0.1/grizzled-slf4j_2.10-1.0.1.jar > <span class="nv">$HADOOP_YARN_HOME</span>/share/hadoop/hdfs/lib/grizzled-slf4j_2.10-1.0.1.jar</code></pre></div> <p>6. Add http configuration in core-site.xml (create the core-site.xml file and add content).</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml">vi $HADOOP_YARN_HOME/conf/core-site.xml</code></pre></div> +<div class="highlight"><pre><code class="xml">vi $HADOOP_YARN_HOME/conf/core-site.xml</code></pre></div> <p>Add the following code:</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="cp"><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></span> +<div class="highlight"><pre><code class="xml"><span class="cp"><?xml-stylesheet type="text/xsl" href="configuration.xsl"?></span> <span class="nt"><configuration></span> <span class="nt"><property></span> <span class="nt"><name></span>fs.http.impl<span class="nt"></name></span> @@ -193,7 +193,7 @@ curl http://search.maven.org/remoteconte <p>7. Basically, you copy the hadoop file in your host machine to slave machines. (172.21.100.35, in my case):</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">scp -r . 172.21.100.35:/tmp/hadoop-2.3.0 +<div class="highlight"><pre><code class="bash">scp -r . 172.21.100.35:/tmp/hadoop-2.3.0 <span class="nb">echo </span>172.21.100.35 > conf/slaves sbin/start-yarn.sh</code></pre></div> @@ -209,7 +209,7 @@ sbin/start-yarn.sh</code></pre></div> <p>1. Download Samza and publish it to Maven local repository.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd</span> /tmp +<div class="highlight"><pre><code class="bash"><span class="nb">cd</span> /tmp git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git <span class="nb">cd </span>incubator-samza ./gradlew clean publishToMavenLocal @@ -217,17 +217,17 @@ git clone http://git-wip-us.apache.org/r <p>2. Download hello-samza project and change the job properties file.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone git://github.com/linkedin/hello-samza.git +<div class="highlight"><pre><code class="bash">git clone git://github.com/linkedin/hello-samza.git <span class="nb">cd </span>hello-samza vi samza-job-package/src/main/config/wikipedia-feed.properties</code></pre></div> <p>Change the yarn.package.path property to be:</p> -<div class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span class="na">yarn.package.path</span><span class="o">=</span><span class="s">http://yourHostname:8000/samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz</span></code></pre></div> +<div class="highlight"><pre><code class="jproperties"><span class="na">yarn.package.path</span><span class="o">=</span><span class="s">http://yourHostname:8000/samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz</span></code></pre></div> <p>3. Complie hello-samza.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">mvn clean package +<div class="highlight"><pre><code class="bash">mvn clean package mkdir -p deploy/samza tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C deploy/samza</code></pre></div> @@ -235,11 +235,11 @@ tar -xvf ./samza-job-package/target/samz <p>Open a new terminal, and run:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">cd</span> /tmp/hello-samza <span class="o">&&</span> python -m SimpleHTTPServer</code></pre></div> +<div class="highlight"><pre><code class="bash"><span class="nb">cd</span> /tmp/hello-samza <span class="o">&&</span> python -m SimpleHTTPServer</code></pre></div> <p>Go back to the original terminal (not the one running the HTTP server):</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div> +<div class="highlight"><pre><code class="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div> <p>Go to http://yourHostname:8088 and find the wikipedia-feed job. Click on the ApplicationMaster link to see that it’s running.</p>
Modified: incubator/samza/site/sitemap.xml URL: http://svn.apache.org/viewvc/incubator/samza/site/sitemap.xml?rev=1612998&r1=1612997&r2=1612998&view=diff ============================================================================== --- incubator/samza/site/sitemap.xml (original) +++ incubator/samza/site/sitemap.xml Thu Jul 24 05:05:00 2014 @@ -20,7 +20,7 @@ <url> <loc>http://samza.incubator.apache.org/</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> <changefreq>daily</changefreq> <priority>1.0</priority> </url> @@ -30,308 +30,315 @@ <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/application-master.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/architecture.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/background.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/checkpointing.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/code.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/coding-guide.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/community/committers.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/introduction/concepts.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/configuration.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/disclaimer.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/event-loop.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/index.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/index.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/index.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/startup/download/index.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/startup/hello-samza/0.7.0/index.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/introduction.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/community/irc.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/yarn/isolation.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/jmx.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/job-runner.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/kafka.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/logging.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/community/mailing-lists.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/metrics.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/mupd8.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/api/overview.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/packaging.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/projects.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/remote-debugging-samza.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/reprocessing.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/rules.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/run-hello-samza-without-internet.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/samza-container.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/operations/security.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/contribute/seps.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/serialization.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> + + + </url> + + <url> + <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/spark-streaming.html</loc> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/state-management.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/comparisons/storm.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/streams.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/container/windowing.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> <url> <loc>http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/yarn-jobs.html</loc> - <lastmod>2014-07-09</lastmod> + <lastmod>2014-07-23</lastmod> </url> Modified: incubator/samza/site/startup/download/index.html URL: http://svn.apache.org/viewvc/incubator/samza/site/startup/download/index.html?rev=1612998&r1=1612997&r2=1612998&view=diff ============================================================================== --- incubator/samza/site/startup/download/index.html (original) +++ incubator/samza/site/startup/download/index.html Thu Jul 24 05:05:00 2014 @@ -141,7 +141,7 @@ <p>A Maven-based Samza project can pull in all required dependencies Samza dependencies this XML block:</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt"><dependency></span> +<div class="highlight"><pre><code class="xml"><span class="nt"><dependency></span> <span class="nt"><groupId></span>org.apache.samza<span class="nt"></groupId></span> <span class="nt"><artifactId></span>samza-api<span class="nt"></artifactId></span> <span class="nt"><version></span>0.7.0<span class="nt"></version></span> @@ -190,14 +190,14 @@ <p>Samza is available in the Apache Maven repository.</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt"><repository></span> +<div class="highlight"><pre><code class="xml"><span class="nt"><repository></span> <span class="nt"><id></span>apache-releases<span class="nt"></id></span> <span class="nt"><url></span>https://repository.apache.org/content/groups/public<span class="nt"></url></span> <span class="nt"></repository></span></code></pre></div> <p>Snapshot builds are available in the Apache Maven snapshot repository.</p> -<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt"><repository></span> +<div class="highlight"><pre><code class="xml"><span class="nt"><repository></span> <span class="nt"><id></span>apache-snapshots<span class="nt"></id></span> <span class="nt"><url></span>https://repository.apache.org/content/groups/snapshots<span class="nt"></url></span> <span class="nt"></repository></span></code></pre></div> @@ -206,7 +206,7 @@ <p>If you’re interested in working on Samza, or building the JARs from scratch, then you’ll need to checkout and build the code. Samza does not have a binary release at this time. To check out and build Samza, run these commands.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git +<div class="highlight"><pre><code class="bash">git clone http://git-wip-us.apache.org/repos/asf/incubator-samza.git <span class="nb">cd </span>incubator-samza ./gradlew clean build</code></pre></div> Modified: incubator/samza/site/startup/hello-samza/0.7.0/index.html URL: http://svn.apache.org/viewvc/incubator/samza/site/startup/hello-samza/0.7.0/index.html?rev=1612998&r1=1612997&r2=1612998&view=diff ============================================================================== --- incubator/samza/site/startup/hello-samza/0.7.0/index.html (original) +++ incubator/samza/site/startup/hello-samza/0.7.0/index.html Thu Jul 24 05:05:00 2014 @@ -129,7 +129,7 @@ <p>Check out the hello-samza project:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">git clone git://git.apache.org/incubator-samza-hello-samza.git hello-samza +<div class="highlight"><pre><code class="bash">git clone git://git.apache.org/incubator-samza-hello-samza.git hello-samza <span class="nb">cd </span>hello-samza</code></pre></div> <p>This project contains everything you’ll need to run your first Samza jobs.</p> @@ -138,7 +138,7 @@ <p>A Samza grid usually comprises three different systems: <a href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a>, <a href="http://kafka.apache.org/">Kafka</a>, and <a href="http://zookeeper.apache.org/">ZooKeeper</a>. The hello-samza project comes with a script called “grid” to help you setup these systems. Start by running:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">bin/grid bootstrap</code></pre></div> +<div class="highlight"><pre><code class="bash">bin/grid bootstrap</code></pre></div> <p>This command will download, install, and start ZooKeeper, Kafka, and YARN. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called “deploy” inside hello-samza’s root folder.</p> @@ -150,7 +150,7 @@ <p>Before you can run a Samza job, you need to build a package for it. This package is what YARN uses to deploy your jobs on the grid.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">mvn clean package +<div class="highlight"><pre><code class="bash">mvn clean package mkdir -p deploy/samza tar -xvf ./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz -C deploy/samza</code></pre></div> @@ -158,11 +158,11 @@ tar -xvf ./samza-job-package/target/samz <p>After you’ve built your Samza package, you can start a job on the grid using the run-job.sh script.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div> +<div class="highlight"><pre><code class="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-feed.properties</code></pre></div> <p>The job will consume a feed of real-time edits from Wikipedia, and produce them to a Kafka topic called “wikipedia-raw”. Give the job a minute to startup, and then tail the Kafka topic:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-raw</code></pre></div> +<div class="highlight"><pre><code class="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-raw</code></pre></div> <p>Pretty neat, right? Now, check out the YARN UI again (<a href="http://localhost:8088">http://localhost:8088</a>). This time around, you’ll see your Samza job is running!</p> @@ -172,20 +172,20 @@ tar -xvf ./samza-job-package/target/samz <p>Let’s calculate some statistics based on the messages in the wikipedia-raw topic. Start two more jobs:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-parser.properties +<div class="highlight"><pre><code class="bash">deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-parser.properties deploy/samza/bin/run-job.sh --config-factory<span class="o">=</span>org.apache.samza.config.factories.PropertiesConfigFactory --config-path<span class="o">=</span>file://<span class="nv">$PWD</span>/deploy/samza/config/wikipedia-stats.properties</code></pre></div> <p>The first job (wikipedia-parser) parses the messages in wikipedia-raw, and extracts information about the size of the edit, who made the change, etc. You can take a look at its output with:</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-edits</code></pre></div> +<div class="highlight"><pre><code class="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-edits</code></pre></div> <p>The last job (wikipedia-stats) reads messages from the wikipedia-edits topic, and calculates counts, every ten seconds, for all edits that were made during that window. It outputs these counts to the wikipedia-stats topic.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-stats</code></pre></div> +<div class="highlight"><pre><code class="bash">deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic wikipedia-stats</code></pre></div> <p>The messages in the stats topic look like this:</p> -<div class="highlight"><pre><code class="language-json" data-lang="json"><span class="p">{</span><span class="nt">"is-talk"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"bytes-added"</span><span class="p">:</span><span class="mi">5276</span><span class="p">,</span><span class="nt">"edits"</span><span class="p">:</span><span class="mi">13</span><span class="p">,</span><span class="nt">"unique-titles"</span><span class="p">:</span><span class="mi">13</span><span class="p">}</span> +<div class="highlight"><pre><code class="json"><span class="p">{</span><span class="nt">"is-talk"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"bytes-added"</span><span class="p">:</span><span class="mi">5276</span><span class="p">,</span><span class="nt">"edits"</span><span class="p">:</span><span class="mi">13</span><span class="p">,</span><span class="nt">"unique-titles"</span><span class="p">:</span><span class="mi">13</span><span class="p">}</span> <span class="p">{</span><span class="nt">"is-bot-edit"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="nt">"is-talk"</span><span class="p">:</span><span class="mi">3</span><span class="p">,</span><span class="nt">"bytes-added"</span><span class="p">:</span><span class="mi">4211</span><span class="p">,</span><span class="nt">"edits"</span><span class="p">:</span><span class="mi">30</span><span class="p">,</span><span class="nt">"unique-titles"</span><span class="p">:</span><span class="mi">30</span><span class="p">,</span><span class="nt">"is-unpatrolled"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="nt">"is-new"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"is-minor"</span><span class="p">:</span><span class="mi">7</span><span class="p">}</span> <span class="p">{</span><span class="nt">"bytes-added"</span><span class="p">:</span><span class="mi">3180</span><span class="p">,</span><span class="nt">"edits"</span><span class="p">:</span><span class="mi">19</span><span class="p">,</span><span class="nt">"unique-titles"</span><span class="p">:</span><span class="mi">19</span><span class="p">,</span><span class="nt">"is-unpatrolled"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="nt">"is-new"</span><span class="p">:</span><span class="mi">1</span><span class="p">,</span><span class="nt">"is-minor"</span><span class="p">:</span><span class="mi">3</span><span class="p">}</span> <span class="p">{</span><span class="nt">"bytes-added"</span><span class="p">:</span><span class="mi">2218</span><span class="p">,</span><span class="nt">"edits"</span><span class="p">:</span><span class="mi">18</span><span class="p">,</span><span class="nt">"unique-titles"</span><span class="p">:</span><span class="mi">18</span><span class="p">,</span><span class="nt">"is-unpatrolled"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"is-new"</span><span class="p">:</span><span class="mi">2</span><span class="p">,</span><span class="nt">"is-minor"</span><span class="p">:</span><span class="mi">3</span><span class="p">}</span></code></pre></div> @@ -196,7 +196,7 @@ deploy/samza/bin/run-job.sh --config-fac <p>After you’re done, you can clean everything up using the same grid script.</p> -<div class="highlight"><pre><code class="language-bash" data-lang="bash">bin/grid stop all</code></pre></div> +<div class="highlight"><pre><code class="bash">bin/grid stop all</code></pre></div> <p>Congratulations! You’ve now setup a local grid that includes YARN, Kafka, and ZooKeeper, and run a Samza job on it. Next up, check out the <a href="/learn/documentation/0.7.0/introduction/background.html">Background</a> and <a href="/learn/documentation/0.7.0/api/overview.html">API Overview</a> pages.</p>
