Author: rwesten
Date: Thu Oct 17 10:50:09 2013
New Revision: 1533039
URL: http://svn.apache.org/r1533039
Log:
STANBOL-1070: added documentation for the Entity Co-Mention Engine
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Added:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext?rev=1533039&view=auto
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
(added)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/comention.mdtext
Thu Oct 17 10:50:09 2013
@@ -0,0 +1,40 @@
+Title: Co-Mention Engine
+
+The Co-Mention engine aims to link initial mentions of Entities with later
references in the Text.
+
+The typical example are persons only mentioned by their family name after an
initial mention with the full name e.g.
+
+ ... Barack Obama gave a talk to members of the Labor Union ... Obama
specially mentioned ...
+
+__NOTE:__ This Engine does _NOT_ provide/use NLP co-reference support (e.g.
linking a Pronoun with the Entity it stands for). Its purpose it to (1) link
follow up mentions of Entities with the original one and (2) add suggestion of
the initial mention to follow up mentions.
+
+## Configuration
+
+As this engine does use entity linking functionality of the
[EntityLinkingEngine](entitylinking) its configuration uses properties defined
by the [Entity Linker Configuration](entitylinking#entity-linker-configuration).
+
+
+* __Name__ _(stanbol.enhancer.engine.name)_: The name of the Enhancement
Engine. This name is used to refer an [EnhancementEngine](index.html) in
[EnhancementChain](../chains)s
+* __ServiceRankging__ _(service.ranking)_: In case multiple enhancement
engines do use the same name, than only the one with the higher ranking will
get uses.
+* __Case Sensitivity__ _(enhancer.engines.linking.caseSensitive)_: Boolean
switch that allows to activate/deactivate case sensitive matching. It is
important to understand that even with case sensitivity activated an Entity
with the label such as "Anaconda" will be suggested for the mention of
"anaconda" in the text. The main difference will be the confidence value of
such a suggestion as with case sensitivity activated the starting letters "A"
and "a" are NOT considered to be matching. See the second technical part for
details about the matching process. Case Sensitivity is deactivated by default.
It is recommended to be activated if controlled vocabularies contain
abbreviations similar to commonly used words e.g. CAN for Canada.
+* __Proper Noun Linking__ _(enhancer.engines.linking.properNounsState)_:
Enables/Disables proper noun linking for searching co-mentions. By default this
is disabled to also consider Commons Nouns when searching for co-mentions.
However for Vocabularies that only contain Proper Nouns (Persons,
Organizations, ...) enabling this might be useful. For the full documentation
of this feature see the [Text Processing
Configuration](entitylinking#text-processing-configuration) section of the
EntityLinking engine.
+* __Processed Languages__ _(enhancer.engines.linking.processedLanguages)_:
Allows the detailed configuration on how NLP processing results should be
consumed by the Co-Mention engine. For the full documentation of this feature
see the [Text Processing
Configuration](entitylinking#text-processing-configuration)
+
+Other supported properties that are not included in the Felix Webconsole
configuration dialog. Those properties can only be set via OSGI configuration
files. See the [Entity Linking Engine](entitylinking) configuration for the
full documentation of those properties
+
+* __Min Search Token Length__ _(enhancer.engines.linking.minSearchTokenLength)_
+* __Minimum Token Match Score__ _(enhancer.engines.linking.minTokenScore)_
+* __Lemma based Matching__ _(enhancer.engines.linking.lemmaMatching)_
+* __Max Search Token Distance__
_(enhancer.engines.linking.maxSearchTokenDistance)_
+* __Max Search Tokens__ _(enhancer.engines.linking.maxSearchTokens)_
+
+The following properties of the EntityLinking engine are ignored:
+
+* __Type Mappings__ _(enhancer.engines.linking.typeMappings)_: The Co-Mention
engine uses the dc:types of the initial mention. Therefore dc:Type mappings
need not to be specified
+* __Default Matching Language__
_(enhancer.engines.linking.defaultMatchingLanguage)_: The engine uses the
language as detected for the parsed document for matching.
+* __Redirect Field__ _(enhancer.engines.linking.redirectField)_ and __Redirect
Mode__ _(enhancer.engines.linking.redirectMode)_: The engine uses suggestions
of the initial mention. Redirects where already processed for those
suggestions. Therefore the Co-Mention engine does not need to deal with
redirects.
+* __Label Field__ _(enhancer.engines.linking.labelField)_: The engine uses the
initial mention as label to search for co-mentions. Because of theta no label
field needs to be configured.
+* __Type Field__ _(enhancer.engines.linking.typeField)_: The engine uses the
types of the suggestions for the initial mentions.
+* __Suggestions__ _(enhancer.engines.linking.suggestions)_: The Co-Mentions
Engine adds all suggestions of the initial mention to co-mentions.
+* __Min Matched Tokens__ _(enhancer.engines.linking.minFoundTokens)_ is set to
'1' meaning that at least a single token of the initial mention needs to match
co-mentions.
+* __Min Label Score__ _(enhancer.engines.linking.minLabelScore)_ is set to
'1/4' meaning that at least 1/4 of the tokens for the initial mention need to
be present in co-mentions.
+** __Min Match Score__ _(enhancer.engines.linking.minMatchScore)_ is set to a
value so that it does not filter any results.
\ No newline at end of file
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext?rev=1533039&r1=1533038&r2=1533039&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/entitylinking.mdtext
Thu Oct 17 10:50:09 2013
@@ -215,7 +215,7 @@ The parameters below are used to configu
If used in combination with an disambiguation Engine one might want to
consider to suggest Entities where only a single token of multi-token labels do
match. In such cases a configuration like _Min Matched Tokens_=1 and _Min Label
Score_ <= 0.5 (e.g. 0.4) might be considered. With such scenarios users will
also want to considerable increase the value for _Max Suggestions_ (typically
values > 10).
-* __Min Text Score__ _(enhancer.engines.linking.minTextScore)_ [0..1]::double:
The "Text Score" [0..1] represents how well the Label of an Entity matches to
the selected Span in the Text. It compares the number of matched {@link Token}
from the label with the number of Tokens enclosed by the Span in the Text an
Entity is suggested for. Not exact matches for Tokens, or if the Tokens within
the label do appear in an other order than in the text do also reduce this
score. Entities are only considered if at least one of their labels cores
higher than the minimum for all tree of _Min Labe Score_, _Min Text Match
Score_ and _Min Match Score_.
+* __Min Text Score__ _(enhancer.engines.linking.minTextScore)_ [0..1]::double:
The "Text Score" [0..1] represents how well the Label of an Entity matches to
the selected Span in the Text. It compares the number of matched {@link Token}
from the label with the number of Tokens enclosed by the Span in the Text an
Entity is suggested for. Not exact matches for Tokens, or if the Tokens within
the label do appear in an other order than in the text do also reduce this
score. Entities are only considered if at least one of their labels cores
higher than the minimum for all three of _Min Label Score_, _Min Text Match
Score_ and _Min Match Score_.
* __Min Match Score__ _(enhancer.engines.linking.minMatchScore)_
[0..1]::double: Defined as the product of the "Text Score" with the "Label
Score" - meaning that this value represents both how well the label matches the
text and how much of the label is matched with the text. Entities are only
considered if at least one of their labels cores higher than the minimum for
all tree of _Min Labe Score_, _Min Text Match Score_ and _Min Match Score_.
* __Use EntityRankings__ _(enhancer.engines.linking.useEntityRankings)_
::boolean (default=true): Entity Rankings can be used to define the ranking
(popularity, importance, connectivity, ...) of an entity relative to other
within the knowledge base. While fise:confidence values calculated by the
EntityLinkingEngie do only represent how well a label of the entity do match
with the given section in the processed text it does make sense for manny use
cases to sort Entities with the same score based on their entity rankings (e.g.
users would expect to get "Paris (France)" suggested before "Paris (Texas)" for
Paris appearing in a text. Enabling this feature will slightly (< 0.1)
change the score of suggestions to ensure such a ordering.
Modified:
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
URL:
http://svn.apache.org/viewvc/stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext?rev=1533039&r1=1533038&r2=1533039&view=diff
==============================================================================
---
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
(original)
+++
stanbol/site/trunk/content/docs/trunk/components/enhancer/engines/list.mdtext
Thu Oct 17 10:50:09 2013
@@ -138,6 +138,9 @@ This category covers enhancement engines
* Provides better linking performance as the [Entityhub Linking
Engine](entityhublinking)
* Requires a lot of CPU after changes of the vocabulary to re-create
the FST models.
+* __[Entity Co-Mention Engine](comention):__
+ * Uses initial mentions of an Entity (e.g. 'Barack Obama' in 'Barack
Obama attended the UN security council ...')
+ * To detect co-mentions at a later position in the same document (e.g.
'Obama' in '... Obama indicated consent â¦')
* __DBpedia Spotlight Annotation Engine:__ Integration of the DBpedia
Spotlight with the Stanbol Enhancer (see
[STANBOL-706](https://issues.apache.org/jira/browse/STANBOL-706))
* includes NLP, Entity Linking and Disambiguation of Entities using
[DBpedia](http://dbpedia.org) as knowledge base