entitylinking.html

buildbot Sun, 09 Jun 2013 22:40:49 -0700

Author: buildbot
Date: Mon Jun 10 05:40:11 2013
New Revision: 865092

Log:
Staging update by buildbot for stanbol


Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun 10 05:40:11 2013
@@ -1 +1 @@
-1491336
+1491339

Modified: 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
--- 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 (original)
+++ 
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
 Mon Jun 10 05:40:11 2013
@@ -214,8 +214,8 @@ Configuration wise this will pre-set the
 <ul>
 <li><strong>lc</strong> {name}::LexicalCategory - The linked <em>Token 
Categories</em>. Valid values include the name's of members of the 
LexicalCategory enumeration (e.g. "Noun", "Verb", "Adjective", "Adposition", 
â¦). Typical configurations include "lc=Noun" or an empty list ("lc" or "lc=") 
to deactivate all categories and provide more fine granular Pos or Tag level 
configuration.</li>
 <li><strong>pos</strong> {name}::Pos - This linked <em>Pos Types</em>. Valid 
values include the name's of members of the Pos enumeration (e.g. "ProperNoun", 
"CommonNoun", "Infinitive", "Gerund", "PresentParticiple" and ~150 others). 
This parameter can be used to provide a very fine granular configuration. It is 
e.g. used by the <em>Link ProperNouns only</em> setting to define that only 
"pos=ProperNoun" are linked.</li>
-<li><strong>tag</strong> {tag}::String - The linked <em>Pos Tags</em>. This 
parameter allows to configure POS tags as used by the POS tagger. This is 
useful if those Tags are not mapped to LexicalCategories or Pos types.
-*<strong>prob</strong> [0..1)::double - the <em>Min PosTag Probability</em>. 
This parameter replaces the formally used <em>Min POS tag probability</em> 
<em>(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)</em>
 property. It defines the minimum confidence so that a POS annotation is 
accepted for linkable and matchable tokens ('value/2' is sufficient for 
rejecting none linked/matched tokens).</li>
+<li><strong>tag</strong> {tag}::String - The linked <em>Pos Tags</em>. This 
parameter allows to configure POS tags as used by the POS tagger. This is 
useful if those Tags are not mapped to LexicalCategories or Pos types.</li>
+<li><strong>prob</strong> [0..1)::double - the <em>Min PosTag 
Probability</em>. This parameter replaces the formally used <em>Min POS tag 
probability</em> 
<em>(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)</em>
 property. It defines the minimum confidence so that a POS annotation is 
accepted for linkable and matchable tokens ('value/2' is sufficient for 
rejecting none linked/matched tokens).</li>
 <li><strong>uc</strong> {NONE/MATCH/LINK}::string - the <em>Upper Case Token 
Mode</em> allows to configure how upper case words are treated. There are three 
possible modes: (1) NONE: defines that they are not specially treated; (2) 
MATCH defines that they are considered as matchable tokens (independent of the 
POS tag or the token length; (3) LINK: defines that they are in any case linked 
with the vocabulary. The default is "LINK" - as upper case words often 
represent named entities - with the exception of German ('de') where the mode 
is set to MATCH - as all Nouns in German are upper case.</li>
 </ul>
 <p>NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the 
configuration. This means that adding "lc=Noun" will render "pos=ProperNoun" 
useless as the Pos type ProperNoun is already included in the LexicalCategory 
Noun.</p>
@@ -228,8 +228,9 @@ Configuration wise this will pre-set the
 </pre></div>
 
 
-<p>The first line enable <em>Link Multiple Matchable Tokens in Phrases</em> 
and linking of upper case tokens for all languages. In addition it sets the 
minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be 
also the default). The following three lines provide additional language 
specific defaults. For German the upper case mode is reset to MATCH as in 
German all Nouns use upper case. For Spain and Dutch linking for the 
LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for 
those languages does not support ProperNoun's and therefore the Engine would 
not link any tokens if <em>Link ProperNouns only</em> is enabled. The same 
configuration in the OSGI '.config' file syntax would look like follows</p>
-<div class="codehilite"><pre><span class="n">enhancer</span><span 
class="p">.</span><span class="n">engines</span><span class="p">.</span><span 
class="n">linking</span><span class="p">.</span><span 
class="n">processedLanguages</span><span class="p">=[</span>&quot;<span 
class="o">*</span><span class="p">;</span><span class="n">lmmtip</span><span 
class="p">;</span><span class="n">uc</span><span class="o">\</span><span 
class="p">=</span><span class="n">LINK</span><span class="p">;</span><span 
class="n">prop</span><span class="o">\</span><span class="p">=</span>0<span 
class="p">.</span>75<span class="p">;</span><span class="n">pprob</span><span 
class="o">\</span><span class="p">=</span>0<span 
class="p">.</span>75&quot;<span class="p">,</span>&quot;<span 
class="n">de</span><span class="p">;</span><span class="n">uc</span><span 
class="o">\</span><span class="p">=</span><span 
class="n">MATCH</span>&quot;<span class="p">,</span>&quot;<span 
class="n">es</span><span class="p">;</span>
 <span class="n">lc</span><span class="o">\</span><span class="p">=</span><span 
class="n">Noun</span>&quot;<span class="p">,</span>&quot;<span 
class="n">nl</span><span class="p">;</span><span class="n">lc</span><span 
class="o">\</span><span class="p">=</span><span 
class="n">Noun</span>&quot;<span class="p">]</span>
+<p>The first line enable <em>Link Multiple Matchable Tokens in Phrases</em> 
and linking of upper case tokens for all languages. In addition it sets the 
minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be 
also the default). The following three lines provide additional language 
specific defaults. For German the upper case mode is reset to MATCH as in 
German all Nouns use upper case. For Spain and Dutch linking for the 
LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for 
those languages does not support ProperNoun's and therefore the Engine would 
not link any tokens if <em>Link ProperNouns only</em> is enabled. The same 
configuration in the OSGI '.config' file syntax would look like follows 
<em>(NOTE: please exclude the line break used here for better 
formatting)</em></p>
+<div class="codehilite"><pre><span class="n">enhancer</span><span 
class="p">.</span><span class="n">engines</span><span class="p">.</span><span 
class="n">linking</span><span class="p">.</span><span 
class="n">processedLanguages</span><span class="p">=</span>
+    <span class="p">[</span>&quot;<span class="o">*</span><span 
class="p">;</span><span class="n">lmmtip</span><span class="p">;</span><span 
class="n">uc</span><span class="o">\</span><span class="p">=</span><span 
class="n">LINK</span><span class="p">;</span><span class="n">prop</span><span 
class="o">\</span><span class="p">=</span>0<span class="p">.</span>75<span 
class="p">;</span><span class="n">pprob</span><span class="o">\</span><span 
class="p">=</span>0<span class="p">.</span>75&quot;<span 
class="p">,</span>&quot;<span class="n">de</span><span class="p">;</span><span 
class="n">uc</span><span class="o">\</span><span class="p">=</span><span 
class="n">MATCH</span>&quot;<span class="p">,</span>&quot;<span 
class="n">es</span><span class="p">;</span><span class="n">lc</span><span 
class="o">\</span><span class="p">=</span><span 
class="n">Noun</span>&quot;<span class="p">,</span>&quot;<span 
class="n">nl</span><span class="p">;</span><span class="n">lc</span><span 
class="o">\</s
 pan><span class="p">=</span><span class="n">Noun</span>&quot;<span 
class="p">]</span>
 </pre></div>

svn commit: r865092 - in /websites/staging/stanbol/trunk/content: ./ docs/trunk/components/enhancer/engines/entitylinking.html

Reply via email to