Author: buildbot
Date: Tue Feb 17 14:45:54 2015
New Revision: 940471
Log:
Staging update by buildbot for jena
Modified:
websites/staging/jena/trunk/content/ (props changed)
websites/staging/jena/trunk/content/documentation/hadoop/mapred.html
Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Feb 17 14:45:54 2015
@@ -1 +1 @@
-1660381
+1660394
Modified: websites/staging/jena/trunk/content/documentation/hadoop/mapred.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/mapred.html
(original)
+++ websites/staging/jena/trunk/content/documentation/hadoop/mapred.html Tue
Feb 17 14:45:54 2015
@@ -191,7 +191,7 @@
<p>Finally you may be interested in the usage of namespaces within your data,
in this case the <code>TripleNamespaceCountMapper</code> or
<code>QuadNamespaceCountMapper</code> can be used to do this. For this use
case you should use the <code>TextCountReducer</code> to total up the counts
for each namespace. Note that the mappers determine the namespace for a URI
simply by splitting after the last <code>#</code> or <code>/</code> in the URI,
if no such character exists then the full URI is considered to be the
namespace.</p>
<h2 id="filtering">Filtering</h2>
<p>Filtering is another classic Map/Reduce use case, here you want to take the
data and extract only the portions that you are interested in based on some
criteria. All our filter <code>Mapper</code> implementations also support a
Job configuration option named <code>rdf.mapreduce.filter.invert</code>
allowing their effects to be inverted if desired e.g.</p>
-<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setProperty</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_INVERT</span><span class="p">,</span>
<span class="n">true</span><span class="p">);</span>
+<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setBoolean</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_INVERT</span><span class="p">,</span>
<span class="n">true</span><span class="p">);</span>
</pre></div>
@@ -208,12 +208,12 @@
<p>In some cases you may only be interesting in triples/quads that are
grounded i.e. don't contain blank nodes in which case the
<code>GroundTripleFilterMapper</code> and <code>GroundQuadFilterMapper</code>
can be used.</p>
<h3 id="data-with-a-specific-uri">Data with a specific URI</h3>
<p>In lots of case you may want to extract only data where a specific URI
occurs in a specific position, for example if you wanted to extract all the
<code>rdf:type</code> declarations then you might want to use the
<code>TripleFilterByPredicateUriMapper</code> or
<code>QuadFilterByPredicateUriMapper</code> as appropriate. The job
configuration option <code>rdf.mapreduce.filter.predicate.uris</code> is used
to provide a comma separated list of the full URIs you want the filter to
accept e.g.</p>
-<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setProperty</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_PREDICATE_URIS</span><span
class="p">,</span> "<span class="n">http</span><span
class="p">:</span><span class="o">//</span><span class="n">example</span><span
class="p">.</span><span class="n">org</span><span class="o">/</span><span
class="n">predicate</span><span class="p">,</span><span
class="n">http</span><span class="p">:</span><span class="o">//</span><span
class="n">another</span><span class="p">.</span><span class="n">org</span><span
class="o">/</span><span class="n">predicate</span>"<span
class="p">);</span>
+<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setBoolean</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_PREDICATE_URIS</span><span
class="p">,</span> "<span class="n">http</span><span
class="p">:</span><span class="o">//</span><span class="n">example</span><span
class="p">.</span><span class="n">org</span><span class="o">/</span><span
class="n">predicate</span><span class="p">,</span><span
class="n">http</span><span class="p">:</span><span class="o">//</span><span
class="n">another</span><span class="p">.</span><span class="n">org</span><span
class="o">/</span><span class="n">predicate</span>"<span
class="p">);</span>
</pre></div>
<p>Similar to the counting of node usage you can substitute
<code>Predicate</code> for <code>Subject</code>, <code>Object</code> or
<code>Graph</code> as desired. You will also need to do this in the job
configuration option, for example to filter on subject URIs in quads use the
<code>QuadFilterBySubjectUriMapper</code> and the
<code>rdf.mapreduce.filter.subject.uris</code> configuration option e.g.</p>
-<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setProperty</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_SUBJECT_URIS</span><span
class="p">,</span> "<span class="n">http</span><span
class="p">:</span><span class="o">//</span><span class="n">example</span><span
class="p">.</span><span class="n">org</span><span class="o">/</span><span
class="n">myInstance</span>"<span class="p">);</span>
+<div class="codehilite"><pre><span class="n">config</span><span
class="p">.</span><span class="n">setBoolean</span><span
class="p">(</span><span class="n">RdfMapReduceConstants</span><span
class="p">.</span><span class="n">FILTER_SUBJECT_URIS</span><span
class="p">,</span> "<span class="n">http</span><span
class="p">:</span><span class="o">//</span><span class="n">example</span><span
class="p">.</span><span class="n">org</span><span class="o">/</span><span
class="n">myInstance</span>"<span class="p">);</span>
</pre></div>