engines: comention.mdtext entitylinking.mdtext list.mdtext

rwesten Thu, 17 Oct 2013 03:51:21 -0700

Author: rwesten
Date: Thu Oct 17 10:50:09 2013
New Revision: 1533039

URL: http://svn.apache.org/r1533039
Log:
STANBOL-1070: added documentation for the Entity Co-Mention Engine


Added:
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
Modified:
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
    
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext

Added: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext?rev=1533039&view=auto
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
 (added)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
 Thu Oct 17 10:50:09 2013
@@ -0,0 +1,40 @@
+Title: Co-Mention Engine
+
+The Co-Mention engine aims to link initial mentions of Entities with later 
references in the Text.
+
+The typical example are persons only mentioned by their family name after an 
initial mention with the full name e.g.
+
+    ... Barack Obama gave a talk to members of the Labor Union ... Obama 
specially mentioned ...
+
+__NOTE:__ This Engine does _NOT_ provide/use NLP co-reference support (e.g. 
linking a Pronoun with the Entity it stands for). Its purpose it to (1) link 
follow up mentions of Entities with the original one and (2) add suggestion of 
the initial mention to follow up mentions.
+
+## Configuration
+
+As this engine does use entity linking functionality of the 
[EntityLinkingEngine](entitylinking) its configuration uses properties defined 
by the [Entity Linker Configuration](entitylinking#entity-linker-configuration).
+
+
+* __Name__ _(stanbol.enhancer.engine.name)_: The name of the Enhancement 
Engine. This name is used to refer an [EnhancementEngine](index.html) in 
[EnhancementChain](../chains)s
+* __ServiceRankging__ _(service.ranking)_: In case multiple enhancement 
engines do use the same name, than only the one with the higher ranking will 
get uses.
+* __Case Sensitivity__ _(enhancer.engines.linking.caseSensitive)_: Boolean 
switch that allows to activate/deactivate case sensitive matching. It is 
important to understand that even with case sensitivity activated an Entity 
with the label such as "Anaconda" will be suggested for the mention of 
"anaconda" in the text. The main difference will be the confidence value of 
such a suggestion as with case sensitivity activated the starting letters "A" 
and "a" are NOT considered to be matching. See the second technical part for 
details about the matching process. Case Sensitivity is deactivated by default. 
It is recommended to be activated if controlled vocabularies contain 
abbreviations similar to commonly used words e.g. CAN for Canada.
+* __Proper Noun Linking__ _(enhancer.engines.linking.properNounsState)_: 
Enables/Disables proper noun linking for searching co-mentions. By default this 
is disabled to also consider Commons Nouns when searching for co-mentions. 
However  for Vocabularies that only contain Proper Nouns (Persons, 
Organizations, ...) enabling this might be useful. For the full documentation 
of this feature see the [Text Processing 
Configuration](entitylinking#text-processing-configuration) section of the 
EntityLinking engine.
+* __Processed Languages__ _(enhancer.engines.linking.processedLanguages)_: 
Allows the detailed configuration on how NLP processing results should be 
consumed by the Co-Mention engine. For the full documentation of this feature 
see the [Text Processing 
Configuration](entitylinking#text-processing-configuration)
+
+Other supported properties that are not included in the Felix Webconsole 
configuration dialog. Those properties can only be set via OSGI configuration 
files. See the [Entity Linking Engine](entitylinking) configuration for the 
full documentation of those properties
+
+* __Min Search Token Length__ _(enhancer.engines.linking.minSearchTokenLength)_
+* __Minimum Token Match Score__ _(enhancer.engines.linking.minTokenScore)_
+* __Lemma based Matching__ _(enhancer.engines.linking.lemmaMatching)_
+* __Max Search Token Distance__ 
_(enhancer.engines.linking.maxSearchTokenDistance)_
+* __Max Search Tokens__ _(enhancer.engines.linking.maxSearchTokens)_
+
+The following properties of the EntityLinking engine are ignored:
+
+* __Type Mappings__ _(enhancer.engines.linking.typeMappings)_: The Co-Mention 
engine uses the dc:types of the initial mention. Therefore dc:Type mappings 
need not to be specified
+* __Default Matching Language__ 
_(enhancer.engines.linking.defaultMatchingLanguage)_: The engine uses the 
language as detected for the parsed document for matching.
+* __Redirect Field__ _(enhancer.engines.linking.redirectField)_ and __Redirect 
Mode__ _(enhancer.engines.linking.redirectMode)_: The engine uses suggestions 
of the initial mention. Redirects where already processed for those 
suggestions. Therefore the Co-Mention engine does not need to deal with 
redirects.
+* __Label Field__ _(enhancer.engines.linking.labelField)_: The engine uses the 
initial mention as label to search for co-mentions. Because of theta no label 
field needs to be configured.
+* __Type Field__ _(enhancer.engines.linking.typeField)_: The engine uses the 
types of the suggestions for the initial mentions.
+* __Suggestions__ _(enhancer.engines.linking.suggestions)_: The Co-Mentions 
Engine adds all suggestions of the initial mention to co-mentions.
+* __Min Matched Tokens__ _(enhancer.engines.linking.minFoundTokens)_ is set to 
'1' meaning that at least a single token of the initial mention needs to match 
co-mentions.
+* __Min Label Score__ _(enhancer.engines.linking.minLabelScore)_ is set to 
'1/4' meaning that at least 1/4 of the tokens for the initial mention need to 
be present in co-mentions.
+** __Min Match Score__ _(enhancer.engines.linking.minMatchScore)_ is set to a 
value so that it does not filter any results.
\ No newline at end of file

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1533039&r1=1533038&r2=1533039&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
 (original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
 Thu Oct 17 10:50:09 2013
@@ -215,7 +215,7 @@ The parameters below are used to configu
 
     If used in combination with an disambiguation Engine one might want to 
consider to suggest Entities where only a single token of multi-token labels do 
match. In such cases a configuration like _Min Matched Tokens_=1 and _Min Label 
Score_ <= 0.5 (e.g. 0.4) might be considered. With such scenarios users will 
also want to considerable increase the value for _Max Suggestions_ (typically 
values > 10).
 
-* __Min Text Score__ _(enhancer.engines.linking.minTextScore)_ [0..1]::double: 
The "Text Score" [0..1] represents how well the Label of an Entity matches to 
the selected Span in the Text. It compares the number of matched {@link Token} 
from the label with the number of Tokens enclosed by the Span in the Text an 
Entity is suggested for. Not exact matches for Tokens, or if the Tokens within 
the label do appear in an other order than in the text do also reduce this 
score. Entities are only considered if at least one of their labels cores 
higher than the minimum for all tree of _Min Labe Score_, _Min Text Match 
Score_ and _Min Match Score_.
+* __Min Text Score__ _(enhancer.engines.linking.minTextScore)_ [0..1]::double: 
The "Text Score" [0..1] represents how well the Label of an Entity matches to 
the selected Span in the Text. It compares the number of matched {@link Token} 
from the label with the number of Tokens enclosed by the Span in the Text an 
Entity is suggested for. Not exact matches for Tokens, or if the Tokens within 
the label do appear in an other order than in the text do also reduce this 
score. Entities are only considered if at least one of their labels cores 
higher than the minimum for all three of _Min Label Score_, _Min Text Match 
Score_ and _Min Match Score_.
 * __Min Match Score__ _(enhancer.engines.linking.minMatchScore)_ 
[0..1]::double: Defined as the product of the "Text Score" with the "Label 
Score" - meaning that this value represents both how well the label matches the 
text and how much of the label is matched with the text. Entities are only 
considered if at least one of their labels cores higher than the minimum for 
all tree of _Min Labe Score_, _Min Text Match Score_ and _Min Match Score_. 
 * __Use EntityRankings__ _(enhancer.engines.linking.useEntityRankings)_ 
::boolean (default=true): Entity Rankings can be used to define the ranking 
(popularity, importance, connectivity, ...) of an entity relative to other 
within the knowledge base. While fise:confidence values calculated by the 
EntityLinkingEngie do only represent how well a label of the entity do match 
with the given section in the processed text it does make sense for manny use 
cases to sort Entities with the same score based on their entity rankings (e.g. 
users would expect to get "Paris (France)" suggested before "Paris (Texas)" for 
Paris appearing in a text. Enabling this feature will slightly (&lt; 0.1) 
change the score of suggestions to ensure such a ordering.        
 

Modified: 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL: 
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1533039&r1=1533038&r2=1533039&view=diff
==============================================================================
--- 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext 
(original)
+++ 
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext 
Thu Oct 17 10:50:09 2013
@@ -138,6 +138,9 @@ This category covers enhancement engines
        * Provides better linking performance as the [Entityhub Linking 
Engine](entityhublinking)
        * Requires a lot of CPU after changes of the vocabulary to re-create 
the FST models.
 
+* __[Entity Co-Mention Engine](comention):__
+       * Uses initial mentions of an Entity (e.g. 'Barack Obama' in 'Barack 
Obama attended the UN security council ...')
+       * To detect co-mentions at a later position in the same document (e.g. 
'Obama' in '... Obama indicated consent â¦') 
 
 * __DBpedia Spotlight Annotation Engine:__ Integration of the DBpedia 
Spotlight with the Stanbol Enhancer (see 
[STANBOL-706](https://issues.apache.org/jira/browse/STANBOL-706))
        * includes NLP, Entity Linking and Disambiguation of Entities using 
[DBpedia](http://dbpedia.org) as knowledge base

svn commit: r1533039 - in /stanbol/site/trunk/content/docs/trunk/components/enhancer/engines: comention.mdtext entitylinking.mdtext list.mdtext

Reply via email to