Author: buildbot
Date: Mon Jun 10 05:40:11 2013
New Revision: 865092
Log:
Staging update by buildbot for stanbol
Modified:
websites/staging/stanbol/trunk/content/ (props changed)
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun 10 05:40:11 2013
@@ -1 +1 @@
-1491336
+1491339
Modified:
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
==============================================================================
---
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
(original)
+++
websites/staging/stanbol/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.html
Mon Jun 10 05:40:11 2013
@@ -214,8 +214,8 @@ Configuration wise this will pre-set the
<ul>
<li><strong>lc</strong> {name}::LexicalCategory - The linked <em>Token
Categories</em>. Valid values include the name's of members of the
LexicalCategory enumeration (e.g. "Noun", "Verb", "Adjective", "Adposition",
â¦). Typical configurations include "lc=Noun" or an empty list ("lc" or "lc=")
to deactivate all categories and provide more fine granular Pos or Tag level
configuration.</li>
<li><strong>pos</strong> {name}::Pos - This linked <em>Pos Types</em>. Valid
values include the name's of members of the Pos enumeration (e.g. "ProperNoun",
"CommonNoun", "Infinitive", "Gerund", "PresentParticiple" and ~150 others).
This parameter can be used to provide a very fine granular configuration. It is
e.g. used by the <em>Link ProperNouns only</em> setting to define that only
"pos=ProperNoun" are linked.</li>
-<li><strong>tag</strong> {tag}::String - The linked <em>Pos Tags</em>. This
parameter allows to configure POS tags as used by the POS tagger. This is
useful if those Tags are not mapped to LexicalCategories or Pos types.
-*<strong>prob</strong> [0..1)::double - the <em>Min PosTag Probability</em>.
This parameter replaces the formally used <em>Min POS tag probability</em>
<em>(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)</em>
property. It defines the minimum confidence so that a POS annotation is
accepted for linkable and matchable tokens ('value/2' is sufficient for
rejecting none linked/matched tokens).</li>
+<li><strong>tag</strong> {tag}::String - The linked <em>Pos Tags</em>. This
parameter allows to configure POS tags as used by the POS tagger. This is
useful if those Tags are not mapped to LexicalCategories or Pos types.</li>
+<li><strong>prob</strong> [0..1)::double - the <em>Min PosTag
Probability</em>. This parameter replaces the formally used <em>Min POS tag
probability</em>
<em>(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)</em>
property. It defines the minimum confidence so that a POS annotation is
accepted for linkable and matchable tokens ('value/2' is sufficient for
rejecting none linked/matched tokens).</li>
<li><strong>uc</strong> {NONE/MATCH/LINK}::string - the <em>Upper Case Token
Mode</em> allows to configure how upper case words are treated. There are three
possible modes: (1) NONE: defines that they are not specially treated; (2)
MATCH defines that they are considered as matchable tokens (independent of the
POS tag or the token length; (3) LINK: defines that they are in any case linked
with the vocabulary. The default is "LINK" - as upper case words often
represent named entities - with the exception of German ('de') where the mode
is set to MATCH - as all Nouns in German are upper case.</li>
</ul>
<p>NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the
configuration. This means that adding "lc=Noun" will render "pos=ProperNoun"
useless as the Pos type ProperNoun is already included in the LexicalCategory
Noun.</p>
@@ -228,8 +228,9 @@ Configuration wise this will pre-set the
</pre></div>
-<p>The first line enable <em>Link Multiple Matchable Tokens in Phrases</em>
and linking of upper case tokens for all languages. In addition it sets the
minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be
also the default). The following three lines provide additional language
specific defaults. For German the upper case mode is reset to MATCH as in
German all Nouns use upper case. For Spain and Dutch linking for the
LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for
those languages does not support ProperNoun's and therefore the Engine would
not link any tokens if <em>Link ProperNouns only</em> is enabled. The same
configuration in the OSGI '.config' file syntax would look like follows</p>
-<div class="codehilite"><pre><span class="n">enhancer</span><span
class="p">.</span><span class="n">engines</span><span class="p">.</span><span
class="n">linking</span><span class="p">.</span><span
class="n">processedLanguages</span><span class="p">=[</span>"<span
class="o">*</span><span class="p">;</span><span class="n">lmmtip</span><span
class="p">;</span><span class="n">uc</span><span class="o">\</span><span
class="p">=</span><span class="n">LINK</span><span class="p">;</span><span
class="n">prop</span><span class="o">\</span><span class="p">=</span>0<span
class="p">.</span>75<span class="p">;</span><span class="n">pprob</span><span
class="o">\</span><span class="p">=</span>0<span
class="p">.</span>75"<span class="p">,</span>"<span
class="n">de</span><span class="p">;</span><span class="n">uc</span><span
class="o">\</span><span class="p">=</span><span
class="n">MATCH</span>"<span class="p">,</span>"<span
class="n">es</span><span class="p">;</span>
<span class="n">lc</span><span class="o">\</span><span class="p">=</span><span
class="n">Noun</span>"<span class="p">,</span>"<span
class="n">nl</span><span class="p">;</span><span class="n">lc</span><span
class="o">\</span><span class="p">=</span><span
class="n">Noun</span>"<span class="p">]</span>
+<p>The first line enable <em>Link Multiple Matchable Tokens in Phrases</em>
and linking of upper case tokens for all languages. In addition it sets the
minimum probabilities for Pos- and Phrase annotations to 0.75 (what would be
also the default). The following three lines provide additional language
specific defaults. For German the upper case mode is reset to MATCH as in
German all Nouns use upper case. For Spain and Dutch linking for the
LexicalCategory Noun is enabled. This is because the OpenNLP POS tagger for
those languages does not support ProperNoun's and therefore the Engine would
not link any tokens if <em>Link ProperNouns only</em> is enabled. The same
configuration in the OSGI '.config' file syntax would look like follows
<em>(NOTE: please exclude the line break used here for better
formatting)</em></p>
+<div class="codehilite"><pre><span class="n">enhancer</span><span
class="p">.</span><span class="n">engines</span><span class="p">.</span><span
class="n">linking</span><span class="p">.</span><span
class="n">processedLanguages</span><span class="p">=</span>
+ <span class="p">[</span>"<span class="o">*</span><span
class="p">;</span><span class="n">lmmtip</span><span class="p">;</span><span
class="n">uc</span><span class="o">\</span><span class="p">=</span><span
class="n">LINK</span><span class="p">;</span><span class="n">prop</span><span
class="o">\</span><span class="p">=</span>0<span class="p">.</span>75<span
class="p">;</span><span class="n">pprob</span><span class="o">\</span><span
class="p">=</span>0<span class="p">.</span>75"<span
class="p">,</span>"<span class="n">de</span><span class="p">;</span><span
class="n">uc</span><span class="o">\</span><span class="p">=</span><span
class="n">MATCH</span>"<span class="p">,</span>"<span
class="n">es</span><span class="p">;</span><span class="n">lc</span><span
class="o">\</span><span class="p">=</span><span
class="n">Noun</span>"<span class="p">,</span>"<span
class="n">nl</span><span class="p">;</span><span class="n">lc</span><span
class="o">\</s
pan><span class="p">=</span><span class="n">Noun</span>"<span
class="p">]</span>
</pre></div>