entitylinking.html

buildbot Sun, 09 Jun 2013 22:30:47 -0700

Author: buildbot
Date: Mon Jun 10 05:29:56 2013
New Revision: 865090

Log:
Staging update by buildbot for stanbol


Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun 10 05:29:56 2013
@@ -1 +1 @@
-1491163
+1491336

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 Mon Jun 10 05:29:56 2013
@@ -102,7 +102,7 @@
 <h2 id="linking-process">Linking Process:</h2>
 <p>The Linking Process consists of three major steps: First it consumes 
results of the NLP processing to determine tokens - words - that need to be 
linked with the configured vocabulary. Second the linking of entities based on 
their labels with the current section of the Text and third the writing of the 
enhancement results.</p>
 <h3 id="token-types">Token Types</h3>
-<p>The KeywordLinkingEngine operates based on tokens (words). Those tokens are 
divided in the following Categories</p>
+<p>The EntityLinkingEngine operates based on tokens (words). Those tokens are 
divided in the following Categories</p>
 <ul>
 <li><strong>Linkable Tokens</strong>: This are words that are linked with the 
Vocabulary. This means that the engine will issue quires in the controlled 
vocabulary for those tokens</li>
 <li><strong>Matchable Tokens</strong>: Matchable tokens are used to refine 
quires. For the matching of entity labels with the text those words are treated 
in the same way as linkable words. So the main difference is that matchable 
words alone will not cause the engine to query for Entities in the Controlled 
Vocabulary.</li>
@@ -117,7 +117,7 @@
 <li><strong>Token Phrase</strong>: If a Token is member of a 
<em>processable</em> Phrase. Phrases are groups of Tokens that can be detected 
by a Chunker. A typical examples are Noun Phrases.</li>
 </ul>
 <h3 id="consumed-nlp-processing-results">Consumed NLP Processing Results:</h3>
-<p>The KeywordLinkingEngine consumes NLP processing results from the 
AnalyzedText ContentPart of the processed ContentItem. The following list 
describes the consumed information and their usage in the linking process: </p>
+<p>The EntityLinkingEngine consumes NLP processing results from the 
AnalyzedText ContentPart of the processed ContentItem. The following list 
describes the consumed information and their usage in the linking process: </p>
 <ol>
 <li>__Language_ <em>(required)</em>: The Language of the Text is acquired from 
the Metadata of the ContentItem. It is required to search for labels in the 
correct language and also to correctly apply language specific configurations 
of the engine.</li>
 <li><strong>Sentences</strong> <em>(optional)</em>: Sentence annotations are 
used as segments for the matching process. In addition for the first word of an 
Sentence the <em>Upper Case</em> feature is NOT set. In the case that no 
Sentence Annotations are present the whole text is treated as a single 
Sentence.</li>
@@ -130,15 +130,15 @@
 <p>The linking process is based the matching of labels of entities returned as 
result for searches for entities in the configured controlled vocabulary. In 
addition the engine can be configured to consider redirects for entities 
returned by searches.</p>
 <p>Searches are issued only for <em>Linkable Tokens</em> and may include up to 
<em>Max Search Tokens</em> additional <em>Linkable-</em> or <em>Matchable 
Tokens</em>. If the <em>Linkable Token</em> is within an <em>Phrase</em> than 
only other tokens within the same phrase are considered. Otherwise any 
<em>Linkable-</em> or <em>Matchable Tokens</em> within the configured <em>Max 
Search Token Distance</em> is considered for the search.</p>
 <p>Searches to the controlled vocabulary are issued using the 
<em>EntitySearcher</em> interface and build like follows:</p>
-<div class="codehilite"><pre><span class="p">{</span><span 
class="ow">lt</span><span class="p">}</span><span class="nv">@</span><span 
class="p">{</span><span class="n">lang</span><span class="p">}</span> <span 
class="o">||</span> <span class="p">{</span><span class="ow">lt</span><span 
class="p">}</span><span class="nv">@</span><span class="p">{</span><span 
class="n">dl</span><span class="p">}</span> <span class="o">||</span> <span 
class="p">[{</span><span class="n">at</span><span class="p">}</span><span 
class="nv">@</span><span class="p">{</span><span class="n">lang</span><span 
class="p">}</span> <span class="o">||</span> <span class="p">{</span><span 
class="n">at</span><span class="p">}</span><span class="nv">@</span><span 
class="p">{</span><span class="n">dl</span><span class="p">}</span> <span 
class="o">...</span> <span class="p">]</span>
+<div class="codehilite"><pre><span class="p">{</span><span 
class="n">lt</span><span class="p">}@{</span><span class="n">lang</span><span 
class="p">}</span> <span class="o">||</span> <span class="p">{</span><span 
class="n">lt</span><span class="p">}@{</span><span class="n">dl</span><span 
class="p">}</span> <span class="o">||</span> <span class="p">[{</span><span 
class="n">at</span><span class="p">}@{</span><span class="n">lang</span><span 
class="p">}</span> <span class="o">||</span> <span class="p">{</span><span 
class="n">at</span><span class="p">}@{</span><span class="n">dl</span><span 
class="p">}</span> <span class="p">...</span> <span class="p">]</span>
 </pre></div>
 
 
 <p>where:</p>
-<div class="codehilite"><pre><span class="o">*</span> <span 
class="p">{</span><span class="ow">lt</span><span class="p">}</span> <span 
class="o">...</span> <span class="n">the</span> <span 
class="n">_Linkable</span> <span class="n">Token_</span> <span 
class="k">for</span> <span class="n">that</span> <span class="n">the</span> 
<span class="n">search</span> <span class="n">is</span> <span 
class="n">issued</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">at</span><span class="p">}</span> <span class="o">...</span> <span 
class="n">additional</span> <span class="n">_Linkable</span><span 
class="o">-</span><span class="n">_</span> <span class="ow">or</span> <span 
class="n">_Matchable</span> <span class="n">Tokens_</span> <span 
class="n">included</span> <span class="n">in</span> <span class="n">the</span> 
<span class="n">search</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">lang</span><span class="p">}</span> <span class="o">...</span> <span 
class="n">the</span> <span class="n">language</span> <span class="n">of</span> 
<span class="n">the</span> <span class="n">text</span>
-<span class="o">*</span> <span class="p">{</span><span 
class="n">dl</span><span class="p">}</span> <span class="o">...</span> <span 
class="n">the</span> <span class="n">configured</span> <span 
class="n">_Default</span> <span class="n">Matching</span> <span 
class="n">Language_</span><span class="o">.</span> <span class="n">If</span> 
<span class="p">{</span><span class="n">df</span><span class="p">}</span> <span 
class="o">==</span> <span class="p">{</span><span class="n">lang</span><span 
class="p">}</span> <span class="n">than</span> <span class="n">the</span> <span 
class="ow">or</span> <span class="n">term</span><span class="p">(</span><span 
class="n">s</span><span class="p">)</span> <span class="k">for</span> <span 
class="n">the</span> <span class="p">{</span><span class="n">dl</span><span 
class="p">}</span> <span class="n">are</span> <span class="n">omitted</span>
+<div class="codehilite"><pre><span class="o">*</span> <span 
class="p">{</span><span class="n">lt</span><span class="p">}</span> <span 
class="p">...</span> <span class="n">the</span> <span 
class="n">_Linkable</span> <span class="n">Token_</span> <span 
class="k">for</span> <span class="n">that</span> <span class="n">the</span> 
<span class="n">search</span> <span class="n">is</span> <span 
class="n">issued</span>
+<span class="o">*</span> <span class="p">{</span><span 
class="n">at</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">additional</span> <span class="n">_Linkable</span><span 
class="o">-</span><span class="n">_</span> <span class="n">or</span> <span 
class="n">_Matchable</span> <span class="n">Tokens_</span> <span 
class="n">included</span> <span class="n">in</span> <span class="n">the</span> 
<span class="n">search</span>
+<span class="o">*</span> <span class="p">{</span><span 
class="n">lang</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">the</span> <span class="n">language</span> <span class="n">of</span> 
<span class="n">the</span> <span class="n">text</span>
+<span class="o">*</span> <span class="p">{</span><span 
class="n">dl</span><span class="p">}</span> <span class="p">...</span> <span 
class="n">the</span> <span class="n">configured</span> <span 
class="n">_Default</span> <span class="n">Matching</span> <span 
class="n">Language_</span><span class="p">.</span> <span class="n">If</span> 
<span class="p">{</span><span class="n">df</span><span class="p">}</span> <span 
class="o">==</span> <span class="p">{</span><span class="n">lang</span><span 
class="p">}</span> <span class="n">than</span> <span class="n">the</span> <span 
class="n">or</span> <span class="n">term</span><span class="p">(</span><span 
class="n">s</span><span class="p">)</span> <span class="k">for</span> <span 
class="n">the</span> <span class="p">{</span><span class="n">dl</span><span 
class="p">}</span> <span class="n">are</span> <span class="n">omitted</span>
 </pre></div>
 
 
@@ -157,7 +157,7 @@
 <p>The configuration of the EntityLinkingEngine done by parsing a 
<em>TextProcessingConfig</em> and an <em>EntityLinkingConfig</em> in it 
constructor. Both configuration classes provide an API base configuration (via 
getter and setter) as well as an OSGI Dictionary based configuration (via a 
static method that configures a new instance by an parsed configuration).</p>
 <p>The following two sections describe the "key, value" based configuration as 
the API based version is anyway described by the JavaDoc.</p>
 <h3 id="text-processing-configuration">Text Processing Configuration</h3>
-<h4 
id="proper-noun-linking-wzxhzdk14enhancerengineslinkingpropernounsstatewzxhzdk15">Proper
 Noun Linking 
<small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
+<h4 
id="proper-noun-linking-wzxhzdk16enhancerengineslinkingpropernounsstatewzxhzdk17">Proper
 Noun Linking 
<small><em>(enhancer.engines.linking.properNounsState)</em></small></h4>
 <p>This is a high level configuration option allowing users to easily specify 
if they want to do EntityLinking based on any Nouns ("Noun Linking") or only 
ProperNouns ("Proper Noun Linking").
 Configuration wise this will pre-set the defaults for the linkable 
<em>LexcicalCategories</em> and <em>Pos</em> types.</p>
 <p>"Noun linking" is equivalent to the behavior of the <a 
href="keywordlinkingengine">KeywordLinkingEngine</a> while "Proper Noun 
Linking" is similar to using NER (Named Entity Recognition) with the <a 
href="namedentityextractionengine">NamedEntityLinking</a> engine. </p>
@@ -171,7 +171,7 @@ Configuration wise this will pre-set the
 </li>
 </ol>
 <p>If suitable it is strongly recommended to activate "Proper Noun Linking" as 
it highly increases the performance because in typical text only around 1/10 of 
the Nouns are marked as Proper Nouns and therefore the amount of vocabulary 
lookups also decreases by this amount.</p>
-<h4 
id="language-processing-configuration-wzxhzdk16enhancerengineslinkingprocessedlanguageswzxhzdk17">Language
 Processing configuration 
<small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
+<h4 
id="language-processing-configuration-wzxhzdk18enhancerengineslinkingprocessedlanguageswzxhzdk19">Language
 Processing configuration 
<small><em>(enhancer.engines.linking.processedLanguages)</em></small></h4>
 <p>This parameter is used for two things: (1) to specify what languages are 
processed and (2) to provide specific configurations on how languages are 
processed. For the 2nd aspect there is also a default configuration that can be 
extended with language specific setting.</p>
 <p><strong>1. Processed Languages Configuration:</strong></p>
 <p>For the configuration of the processed languages the following syntax is 
used:</p>
@@ -189,20 +189,20 @@ Configuration wise this will pre-set the
 
 <p>This specifies that all Languages other than French and Italien are 
processed by an EntityLinkingEngine instance.</p>
 <p>Values MUST BE parsed as Array or Vector. This is done by using the 
["elem1","elem2",...] syntax as defined by OSGI ".config" files. The following 
example shows the two above examples combined to a single configuration.</p>
-<div class="codehilite"><pre><span class="n">org</span><span 
class="o">.</span><span class="n">apache</span><span class="o">.</span><span 
class="n">stanbol</span><span class="o">.</span><span 
class="n">enhancer</span><span class="o">.</span><span 
class="n">engines</span><span class="o">.</span><span 
class="n">keywordextraction</span><span class="o">.</span><span 
class="n">processedLanguages</span><span class="o">=</span><span 
class="p">[</span><span class="s">&quot;!fr&quot;</span><span 
class="p">,</span><span class="s">&quot;!it&quot;</span><span 
class="p">,</span><span class="s">&quot;de&quot;</span><span 
class="p">,</span><span class="s">&quot;en&quot;</span><span 
class="p">,</span><span class="s">&quot;*&quot;</span><span class="p">]</span>
+<div class="codehilite"><pre><span class="n">enhancer</span><span 
class="p">.</span><span class="n">engines</span><span class="p">.</span><span 
class="n">linking</span><span class="p">.</span><span 
class="n">processedLanguages</span><span class="p">=[</span>&quot;!<span 
class="n">fr</span>&quot;<span class="p">,</span>&quot;!<span 
class="n">it</span>&quot;<span class="p">,</span>&quot;<span 
class="n">de</span>&quot;<span class="p">,</span>&quot;<span 
class="n">en</span>&quot;<span class="p">,</span>&quot;<span 
class="o">*</span>&quot;<span class="p">]</span>
 </pre></div>
 
 
 <p><strong>2. Language specific Parameter Configuration</strong></p>
 <p>In addition to specifying the processed languages this configuration can 
also be used to parse language specific parameters. The syntax for parameters 
is as follows</p>
-<div class="codehilite"><pre><span class="p">{</span><span 
class="n">language</span><span class="p">};{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}</span><span class="o">=</span><span class="p">{</span><span 
class="n">param</span><span class="o">-</span><span class="n">value</span><span 
class="p">};{</span><span class="n">param</span><span class="o">-</span><span 
class="n">name</span><span class="p">}</span><span class="o">=</span><span 
class="p">{</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">}</span>
-<span class="o">*</span><span class="p">;{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}</span><span class="o">=</span><span class="p">{</span><span 
class="n">param</span><span class="o">-</span><span class="n">value</span><span 
class="p">};{</span><span class="n">param</span><span class="o">-</span><span 
class="n">name</span><span class="p">}</span><span class="o">=</span><span 
class="p">{</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">}</span>
-<span class="p">;{</span><span class="n">param</span><span 
class="o">-</span><span class="n">name</span><span class="p">}</span><span 
class="o">=</span><span class="p">{</span><span class="n">param</span><span 
class="o">-</span><span class="n">value</span><span class="p">};{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}</span><span class="o">=</span><span class="p">{</span><span 
class="n">param</span><span class="o">-</span><span class="n">value</span><span 
class="p">}</span>
+<div class="codehilite"><pre><span class="p">{</span><span 
class="n">language</span><span class="p">};{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}={</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">};{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}={</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">}</span>
+<span class="o">*</span><span class="p">;{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}={</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">};{</span><span 
class="n">param</span><span class="o">-</span><span class="n">name</span><span 
class="p">}={</span><span class="n">param</span><span class="o">-</span><span 
class="n">value</span><span class="p">}</span>
+<span class="p">;{</span><span class="n">param</span><span 
class="o">-</span><span class="n">name</span><span class="p">}={</span><span 
class="n">param</span><span class="o">-</span><span class="n">value</span><span 
class="p">};{</span><span class="n">param</span><span class="o">-</span><span 
class="n">name</span><span class="p">}={</span><span 
class="n">param</span><span class="o">-</span><span class="n">value</span><span 
class="p">}</span>
 </pre></div>
 
 
 <p>The first line sets the parameter for {language}. The 2nd and 3rd line show 
that either the wildcard language '*' or the empty language '' can be used to 
configure parameters that are used as defaults for all languages. </p>
-<p>The following param-names are supported by the KeywordLinkingEngine</p>
+<p>The following param-names are supported by the EntityLinkingEngine</p>
 <p><strong>Phrase level Parameters:</strong></p>
 <ul>
 <li><strong>pc</strong> {name}::LexicalCategory - The <em>Phrase 
Categories</em> processed by the Engine. Valid values include the name's of 
members of the LexicalCategory enumeration (e.g. "Noun", "Verb", "Adjective", 
"Adposition", ...)</li>
@@ -220,23 +220,23 @@ Configuration wise this will pre-set the
 </ul>
 <p>NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the 
configuration. This means that adding "lc=Noun" will render "pos=ProperNoun" 
useless as the Pos type ProperNoun is already included in the LexicalCategory 
Noun.</p>
 <p><strong>Examples:</strong></p>
-<p>The default configuration for the KeywordLinkingEngine uses the following 
setting</p>
-<div class="codehilite"><pre><span class="o">*</span><span 
class="p">;</span><span class="n">lmmtip</span><span class="p">;</span><span 
class="nb">uc</span><span class="o">=</span><span class="n">LINK</span><span 
class="p">;</span><span class="n">prop</span><span class="o">=</span><span 
class="mf">0.75</span><span class="p">;</span><span class="n">pprob</span><span 
class="o">=</span><span class="mf">0.75</span>
-<span class="n">de</span><span class="p">;</span><span 
class="nb">uc</span><span class="o">=</span><span class="n">MATCH</span>
-<span class="n">es</span><span class="p">;</span><span 
class="nb">lc</span><span class="o">=</span><span class="n">Noun</span>
-<span class="n">nl</span><span class="p">;</span><span 
class="nb">lc</span><span class="o">=</span><span class="n">Noun</span>
+<p>The default configuration for the EntityLinkingEngine uses the following 
setting</p>
+<div class="codehilite"><pre><span class="o">*</span><span 
class="p">;</span><span class="n">lmmtip</span><span class="p">;</span><span 
class="n">uc</span><span class="p">=</span><span class="n">LINK</span><span 
class="p">;</span><span class="n">prob</span><span class="p">=</span>0<span 
class="p">.</span>75<span class="p">;</span><span class="n">pprob</span><span 
class="p">=</span>0<span class="p">.</span>75
+<span class="n">de</span><span class="p">;</span><span 
class="n">uc</span><span class="p">=</span><span class="n">MATCH</span>
+<span class="n">es</span><span class="p">;</span><span 
class="n">lc</span><span class="p">=</span><span class="n">Noun</span>
+<span class="n">nl</span><span class="p">;</span><span 
class="n">lc</span><span class="p">=</span><span class="n">Noun</span>
 </pre></div>
 
 
 <p>The first line enable <em>Link Multiple Matchable Tokens in Phrases</em> 
and linking of upper case tokens for all languages. In addition it sets the 
minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be 
also the default). The following three lines provide additional language 
specific defaults. For German the upper case mode is reset to MATCH as in 
German all Nouns use upper case. For Spain and Dutch linking for the 
LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for 
those languages does not support ProperNoun's and therefore the Engine would 
not link any tokens if <em>Link ProperNouns only</em> is enabled. The same 
configuration in the OSGI '.config' file syntax would look like follows</p>
-<div class="codehilite"><pre><span class="n">org</span><span 
class="o">.</span><span class="n">apache</span><span class="o">.</span><span 
class="n">stanbol</span><span class="o">.</span><span 
class="n">enhancer</span><span class="o">.</span><span 
class="n">engines</span><span class="o">.</span><span 
class="n">keywordextraction</span><span class="o">.</span><span 
class="n">processedLanguages</span><span class="o">=</span><span 
class="p">[</span><span 
class="s">&quot;*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75&quot;</span><span 
class="p">,</span><span class="s">&quot;de;uc\=MATCH&quot;</span><span 
class="p">,</span><span class="s">&quot;es;lc\=Noun&quot;</span><span 
class="p">,</span><span class="s">&quot;nl;lc\=Noun&quot;</span><span 
class="p">]</span>
+<div class="codehilite"><pre><span class="n">enhancer</span><span 
class="p">.</span><span class="n">engines</span><span class="p">.</span><span 
class="n">linking</span><span class="p">.</span><span 
class="n">processedLanguages</span><span class="p">=[</span>&quot;<span 
class="o">*</span><span class="p">;</span><span class="n">lmmtip</span><span 
class="p">;</span><span class="n">uc</span><span class="o">\</span><span 
class="p">=</span><span class="n">LINK</span><span class="p">;</span><span 
class="n">prop</span><span class="o">\</span><span class="p">=</span>0<span 
class="p">.</span>75<span class="p">;</span><span class="n">pprob</span><span 
class="o">\</span><span class="p">=</span>0<span 
class="p">.</span>75&quot;<span class="p">,</span>&quot;<span 
class="n">de</span><span class="p">;</span><span class="n">uc</span><span 
class="o">\</span><span class="p">=</span><span 
class="n">MATCH</span>&quot;<span class="p">,</span>&quot;<span 
class="n">es</span><span class="p">;</span>
 <span class="n">lc</span><span class="o">\</span><span class="p">=</span><span 
class="n">Noun</span>&quot;<span class="p">,</span>&quot;<span 
class="n">nl</span><span class="p">;</span><span class="n">lc</span><span 
class="o">\</span><span class="p">=</span><span 
class="n">Noun</span>&quot;<span class="p">]</span>
 </pre></div>
 
 
 <p>The 2nd example shows how to define default settings without using the 
wildcard '*' that would enable processing of all languages. The following 
example shows an configuration that only enables English and ignores text in 
all other languages.</p>
-<div class="codehilite"><pre><span class="p">;</span><span 
class="n">lmmtip</span><span class="p">;</span><span class="nb">uc</span><span 
class="o">=</span><span class="n">LINK</span><span class="p">;</span><span 
class="n">prop</span><span class="o">=</span><span class="mf">0.75</span><span 
class="p">;</span><span class="n">pprob</span><span class="o">=</span><span 
class="mf">0.75</span>
+<div class="codehilite"><pre><span class="p">;</span><span 
class="n">lmmtip</span><span class="p">;</span><span class="n">uc</span><span 
class="p">=</span><span class="n">LINK</span><span class="p">;</span><span 
class="n">prob</span><span class="p">=</span>0<span class="p">.</span>75<span 
class="p">;</span><span class="n">pprob</span><span class="p">=</span>0<span 
class="p">.</span>75
 <span class="n">en</span>
-<span class="n">de</span><span class="p">;</span><span 
class="nb">uc</span><span class="o">=</span><span class="n">MATCH</span>
+<span class="n">de</span><span class="p">;</span><span 
class="n">uc</span><span class="p">=</span><span class="n">MATCH</span>
 </pre></div>
 
 
@@ -246,7 +246,7 @@ Configuration wise this will pre-set the
 <li><strong>Label Field</strong> 
<em>(enhancer.engines.linking.labelField)</em>: The name of the field/property 
used to link (search and match) Entities. Only a single field is supported for 
performance reasons.</li>
 <li><strong>Case Sensitivity</strong> 
<em>(enhancer.engines.linking.caseSensitive)</em>: Boolean switch that allows 
to activate/deactivate case sensitive matching. It is important to understand 
that even with case sensitivity activated an Entity with the label such as 
"Anaconda" will be suggested for the mention of "anaconda" in the text. The 
main difference will be the confidence value of such a suggestion as with case 
sensitivity activated the starting letters "A" and "a" are NOT considered to be 
matching. See the second technical part for details about the matching process. 
Case Sensitivity is deactivated by default. It is recommended to be activated 
if controlled vocabularies contain abbreviations similar to commonly used words 
e.g. CAN for Canada.</li>
 <li><strong>Type Field</strong> <em>(enhancer.engines.linking.typeField)</em>: 
Values of this field are used as values of the "fise:entity-types" property of 
created "<a 
href="../enhancementstructure.html#fiseentityannotation">fise:EntityAnnotation</a>"s.
 The default is "rdf:type". <em>NOTE</em> that in contrast to the <a 
href="namedentityextractionengine">NamedEntityLinking</a> the types are not 
used for the linking process. They are only used while writing the 
'fise:EntityAnnotation's and to determine the 'dc:type' values of 
'fise:TextAnnotation's.</li>
-<li><strong>Type Mappings</strong> 
<em>(enhancer.engines.linking.typeMappings)</em>: The FISE enhancement 
structure (as used by the Stanbol Enhancer) distinguishes <a 
href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a> and 
<a 
href="../enhancementstructure.html#fiseentityannotation">EntityAnnotation</a>s. 
The Keyword linking engine needs to create both types of Annotations: 
TextAnnotations selecting the words that match some Entities in the Controlled 
Vocabulary and EntityAnnotations that represent an Entity suggested for a 
TextAnnotation. The Type Mappings are used to determine the "dc:type" of the 
TextAnnotation based on the types of the suggested Entity. The default 
configuration comes with mappings for Persons, Organizations, Places and 
Concepts but this fields allows to define additional mappings. For details 
about the syntax see the sub-section "Type Mapping Syntax" below.</li>
+<li><strong>Type Mappings</strong> 
<em>(enhancer.engines.linking.typeMappings)</em>: The FISE enhancement 
structure (as used by the Stanbol Enhancer) distinguishes <a 
href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a> and 
<a 
href="../enhancementstructure.html#fiseentityannotation">EntityAnnotation</a>s. 
The EntityLinkingEgnine needs to create both types of Annotations: 
TextAnnotations selecting the words that match some Entities in the Controlled 
Vocabulary and EntityAnnotations that represent an Entity suggested for a 
TextAnnotation. The Type Mappings are used to determine the "dc:type" of the 
TextAnnotation based on the types of the suggested Entity. The default 
configuration comes with mappings for Persons, Organizations, Places and 
Concepts but this fields allows to define additional mappings. For details 
about the syntax see the sub-section "Type Mapping Syntax" below.</li>
 <li><strong>Redirect Field</strong> 
<em>(enhancer.engines.linking.redirectField)</em> and <strong>Redirect 
Mode</strong> <em>(enhancer.engines.linking.redirectMode)</em>: Redirects allow 
to follow links to other entities defined in the vocabulary linked against. 
This is useful in cases where matched Entities are not equals to the Entities 
that users want to suggest. A good example is <a 
href="http://dbpedia.org";>DBpedia</a> where the Entity 'dbpedia:USA' defines 
only the label "USA" and an redirect to the Entity 'dbpedia:United_States' with 
all the information. The <em>Redirect Mode</em> can now be used to define if 
redirects should be "IGNORE"; "ADD_VALUES" causes information of the redirected 
entity ('dbpedia:United_States') to be added to the matched one 
('dbpedia:USA'); "FOLLOW" will suggest the redirected Entity 
('dbpedia:United_States') instead of the matched one ('dbpedia:USA'). The 
<em>Redirect Field</em> defines the field/property used for redirects.</li>
 <li><strong>Suggestions</strong> 
<em>(enhancer.engines.linking.suggestions)</em>: The maximum number of 
suggestions. The default value for this is '3'. If the engine is used in 
combination with an post processing engine (e.g. disambiguation) that users 
might want to increase this value.</li>
 </ul>
@@ -276,40 +276,39 @@ Configuration wise this will pre-set the
 <p><strong>Min Text Score</strong> 
<em>(enhancer.engines.linking.minTextScore)</em> [0..1]::double: The "Text 
Score" [0..1] represents how well the Label of an Entity matches to the 
selected Span in the Text. It compares the number of matched {@link Token} from 
the label with the number of Tokens enclosed by the Span in the Text an Entity 
is suggested for. Not exact matches for Tokens, or if the Tokens within the 
label do appear in an other order than in the text do also reduce this score. 
Entities are only considered if at least one of their labels cores higher than 
the minimum for all tree of <em>Min Labe Score</em>, <em>Min Text Match 
Score</em> and <em>Min Match Score</em>.</p>
 </li>
 <li><strong>Min Match Score</strong> 
<em>(enhancer.engines.linking.minMatchScore)</em> [0..1]::double: Defined as 
the product of the "Text Score" with the "Label Score" - meaning that this 
value represents both how well the label matches the text and how much of the 
label is matched with the text. Entities are only considered if at least one of 
their labels cores higher than the minimum for all tree of <em>Min Labe 
Score</em>, <em>Min Text Match Score</em> and <em>Min Match Score</em>. </li>
-<li><strong>Use EntityRankings</strong> 
<em>(enhancer.engines.linking.useEntityRankings)</em> ::boolean (default=true): 
Entity Rankings can be used to define the ranking (popularity, importance, 
connectivity, ...) of an entity relative to other within the knowledge base. 
While fise:confidence values calculated by the EntityLinkingEngie do only 
represent how well a label of the entity do match with the given section in the 
processed text it does make sense for manny use cases to sort Entities with the 
same score based on their entity rankings (e.g. users would expect to get 
"Paris (France)" suggested before "Paris (Texas)" for Paris appearing in a 
text. Enabling this feature will slightly (&lt; 0.1) change the score of 
suggestions to ensure such a ordering.   <br />
-</li>
+<li><strong>Use EntityRankings</strong> 
<em>(enhancer.engines.linking.useEntityRankings)</em> ::boolean (default=true): 
Entity Rankings can be used to define the ranking (popularity, importance, 
connectivity, ...) of an entity relative to other within the knowledge base. 
While fise:confidence values calculated by the EntityLinkingEngie do only 
represent how well a label of the entity do match with the given section in the 
processed text it does make sense for manny use cases to sort Entities with the 
same score based on their entity rankings (e.g. users would expect to get 
"Paris (France)" suggested before "Paris (Texas)" for Paris appearing in a 
text. Enabling this feature will slightly (&lt; 0.1) change the score of 
suggestions to ensure such a ordering.     </li>
 </ul>
 <h4 id="type-mappings-syntax">Type Mappings Syntax</h4>
-<p>The Type Mappings are used to determine the "dc:type" of the <a 
href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a> based 
on the types of the suggested Entity. The field "Type Mappings" (property: 
<em>org.apache.stanbol.enhancer.engines.keywordextraction.typeMappings</em>) 
can be used to customize such mappings.</p>
+<p>The Type Mappings are used to determine the "dc:type" of the <a 
href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a> based 
on the types of the suggested Entity. The field "Type Mappings" (property: 
<em>enhancer.engines.linking.typeMappings</em>) can be used to customize such 
mappings.</p>
 <p>This field uses the following syntax</p>
 <div class="codehilite"><pre><span class="p">{</span><span 
class="n">uri</span><span class="p">}</span>
 <span class="p">{</span><span class="n">source</span><span class="p">}</span> 
<span class="o">&gt;</span> <span class="p">{</span><span 
class="n">target</span><span class="p">}</span>
-<span class="p">{</span><span class="n">source1</span><span 
class="p">};</span> <span class="p">{</span><span class="n">source2</span><span 
class="p">};</span> <span class="o">...</span> <span class="p">{</span><span 
class="n">sourceN</span><span class="p">}</span> <span class="o">&gt;</span> 
<span class="p">{</span><span class="n">target</span><span class="p">}</span>
+<span class="p">{</span><span class="n">source1</span><span 
class="p">};</span> <span class="p">{</span><span class="n">source2</span><span 
class="p">};</span> <span class="p">...</span> <span class="p">{</span><span 
class="n">sourceN</span><span class="p">}</span> <span class="o">&gt;</span> 
<span class="p">{</span><span class="n">target</span><span class="p">}</span>
 </pre></div>
 
 
 <p>The first variant is a shorthand for {uri} &gt; {uri} and therefore 
specifies that the {uri} should be used as 'dc:type' for <a 
href="../enhancementstructure.html#fisetextannotation">TextAnnotation</a>s if 
the matched entity is of type {uri}. The second variant matches a {source} URI 
to a {target}. Variant three shows the possibility to match multiple URIs to 
the same target in a single configuration line.</p>
 <p>Both 'ns:localName' and full qualified URIs are supported. For supported 
namespaces see the <a 
href="http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java";>NamespaceEnum</a>.
 Information about accepted (INFO) and ignored (WARN) type mappings are 
available in the logs.</p>
 <p>Some Examples of additional Mappings for the e-health domain:</p>
-<div class="codehilite"><pre><span class="err">drugbank:drugs;</span> <span 
class="err">dbp-ont:Drug;</span> <span class="err">dailymed:drugs;</span> <span 
class="err">sider:drugs;</span> <span class="err">tcm:Medicine</span> <span 
class="err">&gt;</span> <span class="err">drugbank:drugs</span>
-<span class="err">diseasome:diseases;</span> <span 
class="err">linkedct:condition;</span> <span class="err">tcm:Disease</span> 
<span class="err">&gt;</span> <span class="err">diseasome:diseases</span> 
-<span class="err">sider:side_effects</span>
-<span class="err">dailymed:ingredients</span>
-<span class="err">dailymed:organization</span> <span class="err">&gt;</span> 
<span class="err">dbp-ont:Organisation</span>
+<div class="codehilite"><pre><span class="n">drugbank</span><span 
class="o">:</span><span class="n">drugs</span><span class="o">;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont</span><span 
class="o">:</span><span class="n">Drug</span><span class="o">;</span> <span 
class="n">dailymed</span><span class="o">:</span><span 
class="n">drugs</span><span class="o">;</span> <span 
class="n">sider</span><span class="o">:</span><span class="n">drugs</span><span 
class="o">;</span> <span class="n">tcm</span><span class="o">:</span><span 
class="n">Medicine</span> <span class="o">&gt;</span> <span 
class="n">drugbank</span><span class="o">:</span><span class="n">drugs</span>
+<span class="n">diseasome</span><span class="o">:</span><span 
class="n">diseases</span><span class="o">;</span> <span 
class="n">linkedct</span><span class="o">:</span><span 
class="n">condition</span><span class="o">;</span> <span 
class="n">tcm</span><span class="o">:</span><span class="n">Disease</span> 
<span class="o">&gt;</span> <span class="n">diseasome</span><span 
class="o">:</span><span class="n">diseases</span> 
+<span class="n">sider</span><span class="o">:</span><span 
class="n">side_effects</span>
+<span class="n">dailymed</span><span class="o">:</span><span 
class="n">ingredients</span>
+<span class="n">dailymed</span><span class="o">:</span><span 
class="n">organization</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont</span><span 
class="o">:</span><span class="n">Organisation</span>
 </pre></div>
 
 
 <p>The first two lines map some will known Classes that represent drugs and 
diseases to 'drugbank:drugs' and 'diseasome:diseases'. The third and fourth 
line define 1:1 mappings for side effects and ingredients and the last line 
adds 'dailymed:organization' as an additional mapping to DBpedia Ontology 
Organisation.</p>
-<p>The following mappings are predefined by the KeywordLinkingEngine.</p>
-<div class="codehilite"><pre><span class="n">dbp</span><span 
class="o">-</span><span class="n">ont:Person</span><span class="p">;</span> 
<span class="n">foaf:Person</span><span class="p">;</span> <span 
class="n">schema:Person</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont:Person</span>
-<span class="n">dbp</span><span class="o">-</span><span 
class="n">ont:Organisation</span><span class="p">;</span> <span 
class="n">dbp</span><span class="o">-</span><span 
class="n">ont:Newspaper</span><span class="p">;</span> <span 
class="n">schema:Organization</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span 
class="n">ont:Organisation</span>
-<span class="n">dbp</span><span class="o">-</span><span 
class="n">ont:Place</span><span class="p">;</span> <span 
class="n">schema:Place</span><span class="p">;</span> <span 
class="n">gml:_Feature</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont:Place</span>
-<span class="n">skos:Concept</span>
+<p>The following mappings are predefined by the EntityLinkingEngine.</p>
+<div class="codehilite"><pre><span class="n">dbp</span><span 
class="o">-</span><span class="n">ont</span><span class="p">:</span><span 
class="n">Person</span><span class="p">;</span> <span 
class="n">foaf</span><span class="p">:</span><span class="n">Person</span><span 
class="p">;</span> <span class="n">schema</span><span class="p">:</span><span 
class="n">Person</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont</span><span 
class="p">:</span><span class="n">Person</span>
+<span class="n">dbp</span><span class="o">-</span><span 
class="n">ont</span><span class="p">:</span><span 
class="n">Organisation</span><span class="p">;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont</span><span 
class="p">:</span><span class="n">Newspaper</span><span class="p">;</span> 
<span class="n">schema</span><span class="p">:</span><span 
class="n">Organization</span> <span class="o">&gt;</span> <span 
class="n">dbp</span><span class="o">-</span><span class="n">ont</span><span 
class="p">:</span><span class="n">Organisation</span>
+<span class="n">dbp</span><span class="o">-</span><span 
class="n">ont</span><span class="p">:</span><span class="n">Place</span><span 
class="p">;</span> <span class="n">schema</span><span class="p">:</span><span 
class="n">Place</span><span class="p">;</span> <span class="n">gml</span><span 
class="p">:</span><span class="n">_Feature</span> <span class="o">&gt;</span> 
<span class="n">dbp</span><span class="o">-</span><span 
class="n">ont</span><span class="p">:</span><span class="n">Place</span>
+<span class="n">skos</span><span class="p">:</span><span 
class="n">Concept</span>
 </pre></div>
 
 
 <h2 id="extension-points">Extension Points</h2>
-<p>This section describes Interfaces that are used as Extension Points by the 
KeywordLinkingEngine</p>
+<p>This section describes Interfaces that are used as Extension Points by the 
EntityLinkingEngine</p>
 <h3 id="entitysearcher">EntitySearcher</h3>
 <p>The EntitySearch Interface is used by the EntityLinkingEngine to search for 
Entities in the linked Vocabulary. An EntitySearcher instance is parsed in the 
constructor of the EntityLinkingEngine.</p>
 <p>This interface supports with search and dereference two main 
functionalities but also provides some additional metadata. The following list 
provides a short overview about the methods.</p>
@@ -327,9 +326,9 @@ Configuration wise this will pre-set the
 <li><strong>Origin Information</strong> 
<em>getOriginInformation()::Map&lt;UriRef,Collection&lt;Resource&gt;&gt;</em> : 
This method allows to return information about the origin that are added to 
every 'fise:EntityAnnotation' created by the EntityLinkingEngine. This is e.g. 
used by the Entityhub based information to provide the 'id' of the Entiyhub 
Site where the Entities where retrieved from. </li>
 </ul>
 <p>The <a href="entityhublinking">EntityhubLinkingEngine</a> includes 
EntitySearcher implementations based on the FieldQuery search interface 
implemented by the Stanbol Entityhub.</p>
-<p>Currently the StanbolEntityhub based implementations are instantiated based 
on the value of the 
<em>'org.apache.stanbol.enhancer.engines.keywordextraction.referencedSiteId'</em>.
 Users that want to use a different implementation of this Interface to be used 
for linking will need to extend the KeywordLinkingEngine and override the 
#activateEntitySearcher(ComponentContext context, Dictionary<String,Object> 
configuration) and #deactivateEntitySearcher(). Those methods are called during 
activation/deactivation of the KeywordLinkingEngine and are expected to 
set/unset the #entitySearcher field.</p>
+<p>Currently the StanbolEntityhub based implementations are instantiated based 
on the value of the <em>'enhancer.engines.linking.entityhub.siteId'</em>. Users 
that want to use a different implementation of this Interface to be used for 
linking will need to extend the EntityLinkingEngine and override the 
#activateEntitySearcher(ComponentContext context, Dictionary<String,Object> 
configuration) and #deactivateEntitySearcher(). Those methods are called during 
activation/deactivation of the EntityLinkingEngine and are expected to 
set/unset the #entitySearcher field.</p>
 <h3 id="labeltokenizer">LabelTokenizer</h3>
-<p>The LabelTokenizer interface is used to tokenize labels of Entity 
suggestions as returned by the <a href="#entitysearcher">EntitySearcer</a>. As 
the matching process of the KeywordLinkingEngine is based on Tokens (words) 
multi-word labels (e.g. Univerity of Munich) need to be tokenized before they 
can be matched against the current context in the Text.</p>
+<p>The LabelTokenizer interface is used to tokenize labels of Entity 
suggestions as returned by the <a href="#entitysearcher">EntitySearcer</a>. As 
the matching process of the EntityLinkingEngine is based on Tokens (words) 
multi-word labels (e.g. Univerity of Munich) need to be tokenized before they 
can be matched against the current context in the Text.</p>
 <p>The <em>LabelTokenizer</em> interface defines only the single 
<em>tokenize(String label, String language)::String[]</em> method that gets the 
label and the language as parameter and returns the tokens as a String array. 
If the tokenizer where not able to tokenize the label (e.g. because he does not 
support the language) it MUST return NULL. In this case the NamedEntityLinking 
engine will try to match the label as a single token.</p>
 <h4 id="mainlabeltokenizer">MainLabelTokenizer</h4>
 <p>As it might very likely be the case that users will want to use multiple 
LabelTokenizer for different languages the EntityLinkingEngine comes with an 
MainLabelTokenizer implementation. It registers itself as LabelTokenizer with 
highest possible OSGI 'service.ranking' and tracks all other registered 
<em>LabelTokenizers</em>.</p>
@@ -362,6 +361,67 @@ Configuration wise this will pre-set the
 <p>The EntityLinkingEngie also contains an <a 
href="http://opennlp.apache.org";>OpenNLP</a> tokenizer API based 
implementation. As the dependency to OpenNLP and the Stanbol Commons OpenNLP 
module are optionally this implementation will only be active if the 
<code>org.apache.stanbol:org.apache.stanbol.commons.opennlp</code> bundle with 
an version starting from <code>0.10.0</code> is active.</p>
 <p>This <em>LabelTokenizer</em> supports the configuration of custom OpenNLP 
tokenizer models for specific languages e.g. 
"de;model=my-de-tokenizermodel.zip;*" would use a custom model for German and 
the default models for all other languages.</p>
 <p>Internally the OpenNLP service to load tokenizer models for languages. That 
means that tokenizer models are loaded via the DataFileProvider infrastructure. 
For user that means that custom tokenizer models are loaded from the Stanbol 
Datafiles directory ({stanbol-working-dir}/stanbol/datafiles).</p>
+<h3 id="linkingstateaware">LinkingStateAware</h3>
+<p>Added with <a 
href="https://issues.apache.org/jira/browse/STANBOL-1070";>STANBOL-1070</a> this 
interface allows to receive callbacks about the processing state of the entity 
linking process. This interface define methods for start/end section as well as 
start/end token. Both the start and the end method do parsed the active Span as 
parameter. An instance of this interface can be parsed to the constructor of 
the EntityLinker implementation.</p>
+<p>The typical usage of this extension point is as follows:</p>
+<div class="codehilite"><pre><span class="nd">@Reference</span> 
+<span class="kd">protected</span> <span class="n">LabelTokenizer</span> <span 
class="n">labelTokenizer</span><span class="o">;</span>
+
+<span class="kd">private</span> <span class="n">TextProcessingConfig</span> 
<span class="n">textProcessingConfig</span><span class="o">;</span>
+<span class="kd">private</span> <span class="n">EntityLinkerConfig</span> 
<span class="n">linkerConfig</span><span class="o">;</span>
+
+<span class="kd">private</span> <span class="n">EntitySearcher</span> <span 
class="n">entitySearcher</span><span class="o">;</span>
+
+<span class="nd">@Activate</span>
+<span class="nd">@SuppressWarnings</span><span class="o">(</span><span 
class="s">&quot;unchecked&quot;</span><span class="o">)</span>
+<span class="kd">protected</span> <span class="kt">void</span> <span 
class="nf">activate</span><span class="o">(</span><span 
class="n">ComponentContext</span> <span class="n">ctx</span><span 
class="o">)</span> <span class="kd">throws</span> <span 
class="n">ConfigurationException</span> <span class="o">{</span>
+    <span class="kd">super</span><span class="o">.</span><span 
class="na">activate</span><span class="o">(</span><span 
class="n">ctx</span><span class="o">);</span>
+    <span class="n">Dictionary</span><span class="o">&lt;</span><span 
class="n">String</span><span class="o">,</span><span 
class="n">Object</span><span class="o">&gt;</span> <span 
class="n">properties</span> <span class="o">=</span> <span 
class="n">ctx</span><span class="o">.</span><span 
class="na">getProperties</span><span class="o">();</span>
+    <span class="c1">//extract TextProcessing and EnityLinking config from the 
provided properties</span>
+    <span class="n">textProcessingConfig</span> <span class="o">=</span> <span 
class="n">TextProcessingConfig</span><span class="o">.</span><span 
class="na">createInstance</span><span class="o">(</span><span 
class="n">properties</span><span class="o">);</span>
+    <span class="n">linkerConfig</span> <span class="o">=</span> <span 
class="n">EntityLinkerConfig</span><span class="o">.</span><span 
class="na">createInstance</span><span class="o">(</span><span 
class="n">properties</span><span class="o">,</span><span 
class="n">prefixService</span><span class="o">);</span>
+
+    <span class="c1">//create/init the entitySearcher</span>
+    <span class="n">entitySearcher</span> <span class="o">=</span> <span 
class="k">new</span> <span class="n">MyEntitySearcher</span><span 
class="o">();</span>
+
+    <span class="c1">//parse additional properties</span>
+<span class="o">}</span>
+
+<span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">computeEnhancements</span><span class="o">(</span><span 
class="n">ContentItem</span> <span class="n">ci</span><span class="o">)</span> 
<span class="kd">throws</span> <span class="n">EngineException</span> <span 
class="o">{</span>
+    <span class="n">AnalysedText</span> <span class="n">at</span> <span 
class="o">=</span> <span class="n">NlpEngineHelper</span><span 
class="o">.</span><span class="na">getAnalysedText</span><span 
class="o">(</span><span class="k">this</span><span class="o">,</span> <span 
class="n">ci</span><span class="o">,</span> <span class="kc">true</span><span 
class="o">);</span>
+    <span class="n">String</span> <span class="n">language</span> <span 
class="o">=</span> <span class="n">NlpEngineHelper</span><span 
class="o">.</span><span class="na">getLanguage</span><span 
class="o">(</span><span class="k">this</span><span class="o">,</span> <span 
class="n">ci</span><span class="o">,</span> <span class="kc">true</span><span 
class="o">);</span>
+
+    <span class="c1">//create an instance of your LinkingStateAware 
implementation</span>
+    <span class="n">LinkingStateAware</span> <span 
class="n">linkingStateAware</span><span class="o">;</span> <span class="c1">//= 
new YourImpl(..);</span>
+
+    <span class="c1">//create one EntityLinker instance per enhancement 
request</span>
+    <span class="n">EntityLinker</span> <span class="n">entityLinker</span> 
<span class="o">=</span> <span class="k">new</span> <span 
class="n">EntityLinker</span><span class="o">(</span><span 
class="n">at</span><span class="o">,</span><span class="n">language</span><span 
class="o">,</span> 
+        <span class="n">languageConfig</span><span class="o">,</span> <span 
class="n">entitySearcher</span><span class="o">,</span> <span 
class="n">linkerConfig</span><span class="o">,</span> 
+        <span class="n">labelTokenizer</span><span class="o">,</span> <span 
class="n">linkingStateAware</span><span class="o">);</span>
+
+    <span class="c1">//during processing we will receive callbacks to the 
</span>
+    <span class="c1">//linkingStateAware instance</span>
+    <span class="k">try</span> <span class="o">{</span>
+        <span class="n">entityLinker</span><span class="o">.</span><span 
class="na">process</span><span class="o">();</span>
+    <span class="o">}</span> <span class="k">catch</span> <span 
class="o">(</span><span class="n">EntitySearcherException</span> <span 
class="n">e</span><span class="o">)</span> <span class="o">{</span>
+        <span class="n">log</span><span class="o">.</span><span 
class="na">error</span><span class="o">(</span><span class="s">&quot;Unable to 
link Entities with &quot;</span><span class="o">+</span><span 
class="n">entityLinker</span><span class="o">,</span><span 
class="n">e</span><span class="o">);</span>
+        <span class="k">throw</span> <span class="k">new</span> <span 
class="nf">EngineException</span><span class="o">(</span><span 
class="k">this</span><span class="o">,</span> <span class="n">ci</span><span 
class="o">,</span> <span class="s">&quot;Unable to link Entities with 
&quot;</span><span class="o">+</span><span class="n">entityLinker</span><span 
class="o">,</span> <span class="n">e</span><span class="o">);</span>
+    <span class="o">}</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p>Note that it is also possible to use a single 
EntityLinker/LinkingStateAware pair to process multiple ContentItems. However 
in this case received callbacks need to be filtered based on the AnalysedText 
being the context of the Span instanced parsed to the callback methods.</p>
+<div class="codehilite"><pre><span class="nd">@Override</span>
+<span class="kd">public</span> <span class="kt">void</span> <span 
class="nf">startToken</span><span class="o">(</span><span 
class="n">Token</span> <span class="n">token</span><span class="o">)</span> 
<span class="o">{</span>
+    <span class="c1">//process based on the context</span>
+    <span class="n">AnalysedText</span> <span class="n">at</span> <span 
class="o">=</span> <span class="n">token</span><span class="o">.</span><span 
class="na">getContext</span><span class="o">();</span>
+    <span class="c1">// â¦</span>
+<span class="o">}</span>
+</pre></div>
+
+
+<p>In addition such a usage would require the LinkingStateAware implementation 
to be thread save.</p>
   </div>
   
   <div id="footer">

svn commit: r865090 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/entitylinking.html

Reply via email to