Rupert Westenthaler created STANBOL-1070:
--------------------------------------------

             Summary: Entity Co-Mention Engine
                 Key: STANBOL-1070
                 URL: https://issues.apache.org/jira/browse/STANBOL-1070
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Entity Co-Mention Engine
======

The goal of this engine is to extract co-mentions of Entities already detected 
for an document. The typical example are persons only mentioned by their family 
name after an initial mention with the full name e.g.

    ... Barack Obama gave a talk to members of the Labor Union ... Obama 
specially mentioned ... 

But also alternate names used to refer to Entities might be used for extracting 
co-mentions.


NOTE: that this Engine does not use NLP level co-reference (e.g. linking a 
Pronoun with the Entity it stands for).

Implementation
-----

This Engine will be implemented based on existing Entity linking functionality 
as implemented by the EntityLinkingEngine. The main difference is that an 
in-memory EntityMentionIndex will be used as controlled vocabulary to link 
against. This EntityMentionIndex will implement the EntitySearcher interface as 
used by the EntityLinkingEngine to search for Entities. 

The EntityMentionIndex will contain both fise:TextAnnotations (such as 
NamedEntities) as well as fise:EntityAnnotations (entity suggested for 
fise:TextAnnotations).

Writing results of the co-mention extraction will involve

* creating new fise:TextAnnotations with suggested fise:EntityAnnotation (e.g. 
for additional mentions not previously detected by any other engine)
* modification of existing Suggestions for fise:TextAnnotations (e.g. if 
'Sevenson' was linked with "Svenson" (http://rdf.freebase.com/ns/m.0n5rh_s), a 
fictional character from the 1930 film The Silver Horde but "Peter Svenson" 
(http://rdf.freebase.com/ns/m.05wxvv9) an Author was already earlier mentioned 
in the document - the later would be added as additional suggestion to an 
existing TextAnnotation and also confidence values would be adapted accordingly.
* creation of relations between enhancements to express entity co-mentions 
(most likely dc:relation from the co-mention to the initial mention of an 
Entity. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to