Author: rwesten
Date: Mon Jun 10 05:38:00 2013
New Revision: 1491339
URL: http://svn.apache.org/r1491339
Log:
minor: formatting
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1491339&r1=1491338&r2=1491339&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
Mon Jun 10 05:38:00 2013
@@ -155,7 +155,7 @@ __Token level Parameters:__
* __lc__ {name}::LexicalCategory - The linked _Token Categories_. Valid values
include the name's of members of the LexicalCategory enumeration (e.g. "Noun",
"Verb", "Adjective", "Adposition", â¦). Typical configurations include
"lc=Noun" or an empty list ("lc" or "lc=") to deactivate all categories and
provide more fine granular Pos or Tag level configuration.
* __pos__ {name}::Pos - This linked _Pos Types_. Valid values include the
name's of members of the Pos enumeration (e.g. "ProperNoun", "CommonNoun",
"Infinitive", "Gerund", "PresentParticiple" and ~150 others). This parameter
can be used to provide a very fine granular configuration. It is e.g. used by
the _Link ProperNouns only_ setting to define that only "pos=ProperNoun" are
linked.
* __tag__ {tag}::String - The linked _Pos Tags_. This parameter allows to
configure POS tags as used by the POS tagger. This is useful if those Tags are
not mapped to LexicalCategories or Pos types.
-*__prob__ [0..1)::double - the _Min PosTag Probability_. This parameter
replaces the formally used _Min POS tag probability_
_(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_
property. It defines the minimum confidence so that a POS annotation is
accepted for linkable and matchable tokens ('value/2' is sufficient for
rejecting none linked/matched tokens).
+* __prob__ [0..1)::double - the _Min PosTag Probability_. This parameter
replaces the formally used _Min POS tag probability_
_(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_
property. It defines the minimum confidence so that a POS annotation is
accepted for linkable and matchable tokens ('value/2' is sufficient for
rejecting none linked/matched tokens).
* __uc__ {NONE/MATCH/LINK}::string - the _Upper Case Token Mode_ allows to
configure how upper case words are treated. There are three possible modes: (1)
NONE: defines that they are not specially treated; (2) MATCH defines that they
are considered as matchable tokens (independent of the POS tag or the token
length; (3) LINK: defines that they are in any case linked with the vocabulary.
The default is "LINK" - as upper case words often represent named entities -
with the exception of German ('de') where the mode is set to MATCH - as all
Nouns in German are upper case.
NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the
configuration. This means that adding "lc=Noun" will render "pos=ProperNoun"
useless as the Pos type ProperNoun is already included in the LexicalCategory
Noun.
@@ -169,9 +169,10 @@ The default configuration for the Entity
es;lc=Noun
nl;lc=Noun
-The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking
of upper case tokens for all languages. In addition it sets the minimum
probabilities for Pos- and Phrase annotations to 0.75 (what would be also the
default). The following three lines provide additional language specific
defaults. For German the upper case mode is reset to MATCH as in German all
Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun
is enabled. This is because the OpenNLP POS tagger for those languages does not
support ProperNoun's and therefore the Engine would not link any tokens if
_Link ProperNouns only_ is enabled. The same configuration in the OSGI
'.config' file syntax would look like follows
+The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking
of upper case tokens for all languages. In addition it sets the minimum
probabilities for Pos- and Phrase annotations to 0.75 (what would be also the
default). The following three lines provide additional language specific
defaults. For German the upper case mode is reset to MATCH as in German all
Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun
is enabled. This is because the OpenNLP POS tagger for those languages does not
support ProperNoun's and therefore the Engine would not link any tokens if
_Link ProperNouns only_ is enabled. The same configuration in the OSGI
'.config' file syntax would look like follows _(NOTE: please exclude the line
break used here for better formatting)_
-
enhancer.engines.linking.processedLanguages=["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
+ enhancer.engines.linking.processedLanguages=
+
["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
The 2nd example shows how to define default settings without using the
wildcard '*' that would enable processing of all languages. The following
example shows an configuration that only enables English and ignores text in
all other languages.