entitylinking.mdtext

rwesten Sun, 09 Jun 2013 22:38:47 -0700

Author: rwesten
Date: Mon Jun 10 05:38:00 2013
New Revision: 1491339

URL: http://svn.apache.org/r1491339
Log:
minor: formatting


Modified:
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1491339&r1=1491338&r2=1491339&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
 (original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
 Mon Jun 10 05:38:00 2013
@@ -155,7 +155,7 @@ __Token level Parameters:__
 * __lc__ {name}::LexicalCategory - The linked _Token Categories_. Valid values 
include the name's of members of the LexicalCategory enumeration (e.g. "Noun", 
"Verb", "Adjective", "Adposition", â¦). Typical configurations include 
"lc=Noun" or an empty list ("lc" or "lc=") to deactivate all categories and 
provide more fine granular Pos or Tag level configuration.
 * __pos__ {name}::Pos - This linked _Pos Types_. Valid values include the 
name's of members of the Pos enumeration (e.g. "ProperNoun", "CommonNoun", 
"Infinitive", "Gerund", "PresentParticiple" and ~150 others). This parameter 
can be used to provide a very fine granular configuration. It is e.g. used by 
the _Link ProperNouns only_ setting to define that only "pos=ProperNoun" are 
linked.
 * __tag__ {tag}::String - The linked _Pos Tags_. This parameter allows to 
configure POS tags as used by the POS tagger. This is useful if those Tags are 
not mapped to LexicalCategories or Pos types.
-*__prob__ [0..1)::double - the _Min PosTag Probability_. This parameter 
replaces the formally used _Min POS tag probability_ 
_(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_ 
property. It defines the minimum confidence so that a POS annotation is 
accepted for linkable and matchable tokens ('value/2' is sufficient for 
rejecting none linked/matched tokens).
+* __prob__ [0..1)::double - the _Min PosTag Probability_. This parameter 
replaces the formally used _Min POS tag probability_ 
_(org.apache.stanbol.enhancer.engines.keywordextraction.minPosTagProbability)_ 
property. It defines the minimum confidence so that a POS annotation is 
accepted for linkable and matchable tokens ('value/2' is sufficient for 
rejecting none linked/matched tokens).
 * __uc__ {NONE/MATCH/LINK}::string - the _Upper Case Token Mode_ allows to 
configure how upper case words are treated. There are three possible modes: (1) 
NONE: defines that they are not specially treated; (2) MATCH defines that they 
are considered as matchable tokens (independent of the POS tag or the token 
length; (3) LINK: defines that they are in any case linked with the vocabulary. 
The default is "LINK" - as upper case words often represent named entities - 
with the exception of German ('de') where the mode is set to MATCH - as all 
Nouns in German are upper case.
 
 NOTE: that tokens are linked if any of "lc", "pos" or "tag" match the 
configuration. This means that adding "lc=Noun" will render "pos=ProperNoun" 
useless as the Pos type ProperNoun is already included in the LexicalCategory 
Noun.
@@ -169,9 +169,10 @@ The default configuration for the Entity
     es;lc=Noun
     nl;lc=Noun
 
-The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking 
of upper case tokens for all languages. In addition it sets the minimum 
probabilities for Pos- and Phrase annotations to 0.75 (what would be also the 
default). The following three lines provide additional language specific 
defaults. For German the upper case mode is reset to MATCH as in German all 
Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun 
is enabled. This is because the OpenNLP POS tagger for those languages does not 
support ProperNoun's and therefore the Engine would not link any tokens if 
_Link ProperNouns only_ is enabled. The same configuration in the OSGI 
'.config' file syntax would look like follows
+The first line enable _Link Multiple Matchable Tokens in Phrases_ and linking 
of upper case tokens for all languages. In addition it sets the minimum 
probabilities for Pos- and Phrase annotations to 0.75 (what would be also the 
default). The following three lines provide additional language specific 
defaults. For German the upper case mode is reset to MATCH as in German all 
Nouns use upper case. For Spain and Dutch linking for the LexicalCategory Noun 
is enabled. This is because the OpenNLP POS tagger for those languages does not 
support ProperNoun's and therefore the Engine would not link any tokens if 
_Link ProperNouns only_ is enabled. The same configuration in the OSGI 
'.config' file syntax would look like follows _(NOTE: please exclude the line 
break used here for better formatting)_
 
-    
enhancer.engines.linking.processedLanguages=["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
+    enhancer.engines.linking.processedLanguages=
+        
["*;lmmtip;uc\=LINK;prop\=0.75;pprob\=0.75","de;uc\=MATCH","es;lc\=Noun","nl;lc\=Noun"]
 
 The 2nd example shows how to define default settings without using the 
wildcard '*' that would enable processing of all languages. The following 
example shows an configuration that only enables English and ignores text in 
all other languages.

svn commit: r1491339 - /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext

Reply via email to