Rupert Westenthaler created STANBOL-1211:
--------------------------------------------

             Summary: Improve Chunk support for Entitylinking
                 Key: STANBOL-1211
                 URL: https://issues.apache.org/jira/browse/STANBOL-1211
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Both the EntityLinkingEngine as well as the LuceneFstLinkingEngine do currently 
not use Chunk information very well. For now Chunks are only used to also 
lookup multiple matchable tokens in the same chunk with the Vocabulary - to 
increase recall in case proper-noun linking is enabled.

However chunks can also be useful to increase precision by using the span of 
the Chunk as a base for calculating the confidence of the linked Entity. 

A typical example are suggestions for Persons Names: If a text mentions the 
Given and Family name of a Person not present in an vocabulary the 
Entitylinking may suggest Entities just matching on of the two names with a 
100% confidence. When using the span of the Chunk such suggestions would be 
omitted as the minimum label match score is typically > 50%.

Other example include matches for "US {OrgName}" where "US" is linked when the 
whole organization is not found; same with "{OrgName} {Role}" where the {Role} 
(e.g. president) is linked; Also cases like "15. September, 2013" may cause 
September to be suggested if present in the vocabulary.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to