Author: buildbot
Date: Tue Feb 17 13:56:21 2015
New Revision: 940463
Log:
Staging update by buildbot for jena
Added:
websites/staging/jena/trunk/content/documentation/hadoop/demo.html
Removed:
websites/staging/jena/trunk/content/documentation/hadoop/demo.md
Modified:
websites/staging/jena/trunk/content/ (props changed)
Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Feb 17 13:56:21 2015
@@ -1 +1 @@
-1660378
+1660380
Added: websites/staging/jena/trunk/content/documentation/hadoop/demo.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/demo.html (added)
+++ websites/staging/jena/trunk/content/documentation/hadoop/demo.html Tue Feb
17 13:56:21 2015
@@ -0,0 +1,259 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE- 2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+ <title>Apache Jena - Apache Jena Elephas - RDF Stats Demo</title>
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+ <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+ <link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
+ <link href="/css/jena.css" rel="stylesheet" type="text/css">
+ <link rel="shortcut icon" href="/images/favicon.ico" />
+
+ <script src="https://code.jquery.com/jquery-2.0.3.min.js"></script>
+ <script src="/js/jena-navigation.js" type="text/javascript"></script>
+ <script src="/js/bootstrap.min.js" type="text/javascript"></script>
+ <script src="/js/breadcrumbs.js" type="text/javascript"></script>
+
+ <script src="/js/improve.js" type="text/javascript"></script>
+
+
+ <!-- Uncomment to enable code coloring <link href="/css/codehilite.css"
rel="stylesheet" type="text/css"> -->
+
+</head>
+
+<body>
+
+
+
+<nav class="navbar navbar-default" role="navigation">
+<div class="container">
+ <div class="navbar-header">
+
+ <button type="button" class="navbar-toggle" data-toggle="collapse"
data-target=".navbar-ex1-collapse">
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+ <a class="navbar-brand" href="/index.html">
+ <img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png"
alt="jena logo">Apache Jena</a>
+ </div>
+
+ <div class="collapse navbar-collapse navbar-ex1-collapse">
+ <ul class="nav navbar-nav">
+ <li id="homepage"><a href="/index.html"><span class="glyphicon
glyphicon-home"></span> Home</a></li>
+ <li id="download"><a href="/download/index.cgi"><span
class="glyphicon glyphicon-download-alt"></span> Download</a></li>
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b
class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li class="dropdown-header">Tutorials</li>
+ <li><a href="/tutorials/index.html">Overview</a></li>
+ <li><a href="/tutorials/rdf_api.html">RDF core API
tutorial</a></li>
+ <li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
+ <li><a
href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating
SPARQL using ARQ</a></li>
+ <li><a href="/tutorials/using_jena_with_eclipse.html">Using
Jena with Eclipse</a></li>
+ <li><a
href="/documentation/notes/index.html">How-To's</a></li>
+ <li class="divider"></li>
+ <li class="dropdown-header">References</li>
+ <li><a href="/documentation/index.html">Overview</a></li>
+ <li><a href="/documentation/javadoc/">Javadoc</a></li>
+ <li><a href="/documentation/rdf/index.html">RDF API</a></li>
+ <li><a href="/documentation/io/">RDF I/O</a></li>
+ <li><a href="/documentation/query/index.html">ARQ
(SPARQL)</a></li>
+ <li><a href="/documentation/query/text-query.html">Text
Search</a></li>
+ <li><a href="/documentation/tdb/index.html">TDB</a></li>
+ <li><a href="/documentation/sdb/index.html">SDB</a></li>
+ <li><a href="/documentation/jdbc/index.html">SPARQL over
JDBC</a></li>
+ <li><a
href="/documentation/security/index.html">Security</a></li>
+ <li><a
href="/documentation/serving_data/index.html">Fuseki</a></li>
+ <li><a
href="/documentation/assembler/index.html">Assembler</a></li>
+ <li><a href="/documentation/ontology/">Ontology API</a></li>
+ <li><a href="/documentation/inference/index.html">Inference
API</a></li>
+ <li><a href="/documentation/tools/index.html">Command-line
tools</a></li>
+ <li><a
href="/documentation/extras/index.html">Extras</a></li>
+ </ul>
+ </li>
+
+ <li class="drop down">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc
<b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
+ <li><a href="/documentation/javadoc/arq/">ARQ</a></li>
+ <li><a href="/documentation/javadoc/tdb/">TDB</a></li>
+ <li><a href="/documentation/javadoc/text/">Text
Search</a></li>
+ <li><a href="/documentation/javadoc/spatial/">Spatial
Search</a></li>
+ <li><a
href="/documentation/javadoc/security/">Security</a></li>
+ <li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
+ <li><a href="/documentation/javadoc/fuseki/">Fuseki</a></li>
+ </ul>
+ </li>
+
+ <li id="ask"><a href="/help_and_support/index.html"><span
class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get
involved <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+ <li><a
href="/getting_involved/index.html">Contribute</a></li>
+ <li><a
href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
+ <li class="divider"></li>
+ <li class="dropdown-header">Project</li>
+ <li><a href="/about_jena/about.html">About Jena</a></li>
+ <li><a href="/about_jena/roadmap.html">Roadmap</a></li>
+ <li><a
href="/about_jena/architecture.html">Architecture</a></li>
+ <li><a href="/about_jena/team.html">Project team</a></li>
+ <li><a href="/about_jena/contributions.html">Related
projects</a></li>
+ <li class="divider"></li>
+ <li class="dropdown-header">ASF</li>
+ <li><a href="http://www.apache.org/">Apache Software
Foundation</a></li>
+ <li><a
href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Become a
Sponsor</a></li>
+ <li><a
href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </li>
+
+ <li id="edit"><a
href="javascript:improveThisPage(location.href);" title="Improve this Page (Use
username anonymous and empty password)"><span class="glyphicon
glyphicon-pencil"></span> Improve this Page</a></li>
+ </ul>
+ </div>
+</div>
+</nav>
+
+
+<div class="container">
+ <div class="row">
+ <div class="col-md-12">
+ <div id="breadcrumbs"></div>
+ <h1 class="title">Apache Jena Elephas - RDF Stats Demo</h1>
+ <p>The RDF Stats Demo is a pre-built application available as a ready to run
Hadoop Job JAR with all dependencies embedded within it. The demo app uses the
other libraries to allow calculating a number of basic statistics over any RDF
data supported by Elephas.</p>
+<p>To use it you will first need to build it from source or download the
relevant Maven artefact:</p>
+<div class="codehilite"><pre><span class="nt"><dependency></span>
+ <span class="nt"><groupId></span>org.apache.jena<span
class="nt"></groupId></span>
+ <span class="nt"><artifactId></span>jena-elephas-stats<span
class="nt"></artifactId></span>
+ <span class="nt"><version></span>x.y.z<span
class="nt"></version></span>
+ <span class="nt"><classifier></span>hadoop-job<span
class="nt"></classifier></span>
+<span class="nt"></dependency></span>
+</pre></div>
+
+
+<p>Where <code>x.y.z</code> is the desired version.</p>
+<h1 id="pre-requisites">Pre-requisites</h1>
+<p>In order to run this demo you will need to have a Hadoop 2.x cluster
available, for simple experimentation purposes a <a
href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html">single
node cluster</a> will be sufficient.</p>
+<h1 id="running">Running</h1>
+<p>Assuming your cluster is started and running and the <code>hadoop</code>
command is available on your path you can run the application without any
arguments to see help:</p>
+<div class="codehilite"><pre><span class="o">></span> <span
class="n">hadoop</span> <span class="n">jar</span> <span
class="n">jena</span><span class="o">-</span><span
class="n">elephas</span><span class="o">-</span><span
class="n">stats</span><span class="o">-</span><span
class="n">VERSION</span><span class="o">-</span><span
class="n">hadoop</span><span class="o">-</span><span class="n">job</span><span
class="p">.</span><span class="n">jar</span> <span class="n">org</span><span
class="p">.</span><span class="n">apache</span><span class="p">.</span><span
class="n">jena</span><span class="p">.</span><span class="n">hadoop</span><span
class="p">.</span><span class="n">rdf</span><span class="p">.</span><span
class="n">stats</span><span class="p">.</span><span class="n">RdfStats</span>
+<span class="n">NAME</span>
+ <span class="n">hadoop</span> <span class="n">jar</span> <span
class="n">PATH_TO_JAR</span> <span class="n">org</span><span
class="p">.</span><span class="n">apache</span><span class="p">.</span><span
class="n">jena</span><span class="p">.</span><span class="n">hadoop</span><span
class="p">.</span><span class="n">rdf</span><span class="p">.</span><span
class="n">stats</span><span class="p">.</span><span class="n">RdfStats</span>
<span class="o">-</span> <span class="n">A</span>
+ <span class="n">command</span> <span class="n">which</span> <span
class="n">computes</span> <span class="n">statistics</span> <span
class="n">on</span> <span class="n">RDF</span> <span class="n">data</span>
<span class="n">using</span> <span class="n">Hadoop</span>
+
+<span class="n">SYNOPSIS</span>
+ <span class="n">hadoop</span> <span class="n">jar</span> <span
class="n">PATH_TO_JAR</span> <span class="n">org</span><span
class="p">.</span><span class="n">apache</span><span class="p">.</span><span
class="n">jena</span><span class="p">.</span><span class="n">hadoop</span><span
class="p">.</span><span class="n">rdf</span><span class="p">.</span><span
class="n">stats</span><span class="p">.</span><span class="n">RdfStats</span>
+ <span class="p">[</span> <span class="p">{</span><span
class="o">-</span><span class="n">a</span> <span class="o">|</span> <span
class="o">--</span><span class="n">all</span><span class="p">}</span> <span
class="p">]</span> <span class="p">[</span> <span class="p">{</span><span
class="o">-</span><span class="n">d</span> <span class="o">|</span> <span
class="o">--</span><span class="n">data</span><span class="o">-</span><span
class="n">types</span><span class="p">}</span> <span class="p">]</span> <span
class="p">[</span> <span class="p">{</span><span class="o">-</span><span
class="n">g</span> <span class="o">|</span> <span class="o">--</span><span
class="n">graph</span><span class="o">-</span><span class="n">sizes</span><span
class="p">}</span> <span class="p">]</span>
+ <span class="p">[</span> <span class="p">{</span><span
class="o">-</span><span class="n">h</span> <span class="o">|</span> <span
class="o">--</span><span class="n">help</span><span class="p">}</span> <span
class="p">]</span> <span class="p">[</span> <span class="o">--</span><span
class="n">input</span><span class="o">-</span><span class="n">type</span> <span
class="o"><</span><span class="n">inputType</span><span
class="o">></span> <span class="p">]</span> <span class="p">[</span> <span
class="p">{</span><span class="o">-</span><span class="n">n</span> <span
class="o">|</span> <span class="o">--</span><span class="n">node</span><span
class="o">-</span><span class="n">count</span><span class="p">}</span> <span
class="p">]</span>
+ <span class="p">[</span> <span class="o">--</span><span
class="n">namespaces</span> <span class="p">]</span> <span
class="p">{</span><span class="o">-</span><span class="n">o</span> <span
class="o">|</span> <span class="o">--</span><span class="n">output</span><span
class="p">}</span> <span class="o"><</span><span
class="n">OutputPath</span><span class="o">></span> <span class="p">[</span>
<span class="p">{</span><span class="o">-</span><span class="n">t</span> <span
class="o">|</span> <span class="o">--</span><span class="n">type</span><span
class="o">-</span><span class="n">count</span><span class="p">}</span> <span
class="p">]</span>
+ <span class="p">[</span><span class="o">--</span><span
class="p">]</span> <span class="o"><</span><span
class="n">InputPath</span><span class="o">></span><span class="p">...</span>
+
+<span class="n">OPTIONS</span>
+ <span class="o">-</span><span class="n">a</span><span class="p">,</span>
<span class="o">--</span><span class="n">all</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">all</span> <span class="n">available</span> <span
class="n">statistics</span> <span class="n">be</span> <span
class="n">calculated</span>
+
+ <span class="o">-</span><span class="n">d</span><span class="p">,</span>
<span class="o">--</span><span class="n">data</span><span
class="o">-</span><span class="n">types</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">literal</span> <span class="n">data</span> <span
class="n">type</span> <span class="n">usage</span> <span
class="n">counts</span> <span class="n">be</span> <span
class="n">calculated</span>
+
+ <span class="o">-</span><span class="n">g</span><span class="p">,</span>
<span class="o">--</span><span class="n">graph</span><span
class="o">-</span><span class="n">sizes</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">the</span> <span class="nb">size</span> <span class="n">of</span>
<span class="n">each</span> <span class="n">named</span> <span
class="n">graph</span> <span class="n">be</span> <span class="n">counted</span>
+
+ <span class="o">-</span><span class="n">h</span><span class="p">,</span>
<span class="o">--</span><span class="n">help</span>
+ <span class="n">Display</span> <span class="n">help</span> <span
class="n">information</span>
+
+ <span class="o">--</span><span class="n">input</span><span
class="o">-</span><span class="n">type</span> <span class="o"><</span><span
class="n">inputType</span><span class="o">></span>
+ <span class="n">Specifies</span> <span class="n">whether</span> <span
class="n">the</span> <span class="n">input</span> <span class="n">data</span>
<span class="n">is</span> <span class="n">a</span> <span
class="n">mixture</span> <span class="n">of</span> <span class="n">quads</span>
<span class="n">and</span> <span class="n">triples</span><span
class="p">,</span>
+ <span class="n">just</span> <span class="n">quads</span> <span
class="n">or</span> <span class="n">just</span> <span
class="n">triples</span><span class="p">.</span> <span class="n">Using</span>
<span class="n">the</span> <span class="n">most</span> <span
class="n">specific</span> <span class="n">data</span> <span
class="n">type</span> <span class="n">will</span>
+ <span class="n">yield</span> <span class="n">the</span> <span
class="n">most</span> <span class="n">accurate</span> <span
class="n">statistics</span>
+
+ <span class="n">This</span> <span class="n">options</span> <span
class="n">value</span> <span class="n">is</span> <span
class="n">restricted</span> <span class="n">to</span> <span
class="n">the</span> <span class="n">following</span> <span
class="n">value</span><span class="p">(</span><span class="n">s</span><span
class="p">):</span>
+ <span class="n">mixed</span>
+ <span class="n">quads</span>
+ <span class="n">triples</span>
+
+ <span class="o">-</span><span class="n">n</span><span class="p">,</span>
<span class="o">--</span><span class="n">node</span><span
class="o">-</span><span class="n">count</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">node</span> <span class="n">usage</span> <span
class="n">counts</span> <span class="n">be</span> <span
class="n">calculated</span>
+
+ <span class="o">--</span><span class="n">namespaces</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">namespace</span> <span class="n">usage</span> <span
class="n">counts</span> <span class="n">be</span> <span
class="n">calculated</span>
+
+ <span class="o">-</span><span class="n">o</span> <span
class="o"><</span><span class="n">OutputPath</span><span
class="o">></span><span class="p">,</span> <span class="o">--</span><span
class="n">output</span> <span class="o"><</span><span
class="n">OutputPath</span><span class="o">></span>
+ <span class="n">Sets</span> <span class="n">the</span> <span
class="n">output</span> <span class="n">path</span>
+
+ <span class="o">-</span><span class="n">t</span><span class="p">,</span>
<span class="o">--</span><span class="n">type</span><span
class="o">-</span><span class="n">count</span>
+ <span class="n">Requests</span> <span class="n">that</span> <span
class="n">rdf</span><span class="p">:</span><span class="n">type</span> <span
class="n">usage</span> <span class="n">counts</span> <span class="n">be</span>
<span class="n">calculated</span>
+
+ <span class="o">--</span>
+ <span class="n">This</span> <span class="n">option</span> <span
class="n">can</span> <span class="n">be</span> <span class="n">used</span>
<span class="n">to</span> <span class="n">separate</span> <span
class="n">command</span><span class="o">-</span><span class="n">line</span>
<span class="n">options</span> <span class="n">from</span> <span
class="n">the</span>
+ <span class="n">list</span> <span class="n">of</span> <span
class="n">argument</span><span class="p">,</span> <span class="p">(</span><span
class="n">useful</span> <span class="n">when</span> <span
class="n">arguments</span> <span class="n">might</span> <span
class="n">be</span> <span class="n">mistaken</span> <span class="k">for</span>
+ <span class="n">command</span><span class="o">-</span><span
class="n">line</span> <span class="n">options</span><span class="p">)</span>
+
+ <span class="o"><</span><span class="n">InputPath</span><span
class="o">></span>
+ <span class="n">Sets</span> <span class="n">the</span> <span
class="n">input</span> <span class="n">path</span><span class="p">(</span><span
class="n">s</span><span class="p">)</span>
+</pre></div>
+
+
+<p>If we wanted to calculate the node count on some data we could do the
following:</p>
+<div class="codehilite"><pre><span class="o">></span> <span
class="n">hadoop</span> <span class="n">jar</span> <span
class="n">jena</span><span class="o">-</span><span
class="n">elephas</span><span class="o">-</span><span
class="n">stats</span><span class="o">-</span><span
class="n">VERSION</span><span class="o">-</span><span
class="n">hadoop</span><span class="o">-</span><span class="n">job</span><span
class="p">.</span><span class="n">jar</span> <span class="n">org</span><span
class="p">.</span><span class="n">apache</span><span class="p">.</span><span
class="n">jena</span><span class="p">.</span><span class="n">hadoop</span><span
class="p">.</span><span class="n">rdf</span><span class="p">.</span><span
class="n">stats</span><span class="p">.</span><span class="n">RdfStats</span>
<span class="o">--</span><span class="n">node</span><span
class="o">-</span><span class="n">count</span> <span class="o">--</span><span
class="n">output</span> <span class="o">/</span><span class="n">e
xample</span><span class="o">/</span><span class="n">output</span> <span
class="o">/</span><span class="n">example</span><span class="o">/</span><span
class="n">input</span>
+</pre></div>
+
+
+<p>This calculates the node counts for the input data found in
<code>/example/input</code> placing the generated counts in
<code>/example/output</code></p>
+<h2 id="specifying-inputs-and-outputs">Specifying Inputs and Outputs</h2>
+<p>Inputs are specified simply by providing one or more paths to the data you
wish to analyse. You can provide directory paths in which case all files
within the directory will be processed.</p>
+<p>To specify the output location use the <code>-o</code> or
<code>--output</code> option followed by the desired output path.</p>
+<p>By default the demo application assumes a mixture of quads and triples
data, if you know your data is only in triples/quads then you can use the
<code>--input-type</code> argument followed by <code>triples</code> or
<code>quads</code> to indicate the type of your data. Not doing this can skew
some statistics as the default is to assume mixed data and so all triples are
upgraded into quads when calculating the statistics.</p>
+<h2 id="available-statistics">Available Statistics</h2>
+<p>The following statistics are available and are activated by the relevant
command line option:</p>
+<table>
+ <tr><th>Command Line Option</th><th>Statistic</th><th>Description &
Notes</th></tr>
+ <tr><td>`-n` or `--node-count`</td><td>Node Count</td><td>Counts the
occurrences of each unique RDF term i.e. node in Jena parlance</td></tr>
+ <tr><td>`-t` or `--type-count`</td><td>Type Count</td><td>Counts the
occurrences of each declared `rdf:type` value</td></tr>
+ <tr><td>`-d` or `--data-types`</td><td>Data Type Count</td><td>Counts the
occurrences of each declared literal data type</td></tr>
+ <tr><td>`--namespaces`</td><td>Namespace Counts</td><td>Counts the
occurrences of namespaces within the data.<br />Namespaces are determined by
splitting URIs at the `#` fragment separator if present and if not the last `/`
character
+ <tr><td>`-g` or `--graph-sizes`</td><td>Graph Sizes</td><td>Counts the sizes
of each graph declared in the data</td></tr>
+</table>
+
+<p>You can also use the <code>-a</code> or <code>--all</code> option if you
simply wish to calculate all statistics.</p>
+ </div>
+</div>
+
+</div><!--/.container -->
+
+ <footer class="footer">
+ <div class="container">
+ <p>Copyright © 2011–2014 The Apache Software Foundation,
Licensed under
+ the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache
License, Version 2.0</a>.
+ </p>
+ <p>
+ Apache Jena, Jena, the Apache Jena project logo,
+ Apache and the Apache feather logos are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </footer>
+
+
+</body>
+</html>