[ 
https://issues.apache.org/jira/browse/STANBOL-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-1102.
------------------------------------------

    Resolution: Fixed

fixed with http://svn.apache.org/r1496359 (as part of STANBOL-1114)

> EntityLinking MUST only accept Suggestions for the current active Token
> -----------------------------------------------------------------------
>
>                 Key: STANBOL-1102
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1102
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> With the "Max Search Tokens (enhancer.engines.linking.maxSearchTokens)" 
> configuration the EntityLinking Engine does support OR queries for multiple 
> linkable/matchable tokens to the controlled vocabulary (default=2). 
> This feature ensures that Entities that do match longer section in the text 
> are higher ranked. This is especially important for bigger vocabularies 
> and/or common tokens within the vocabulary as the EntityLinking only 
> considers the top 10 (or 3 * max suggestions) query results. 
> However in case multiple Tokens are used for searches there might be 
> suggestions that do match some tokens in the Text, but not the currently 
> active one. Currently those suggestions are taken into account what can cause 
> unwanted states, like the one described in the following Example:
>     "Bei einer gmeinsamen Pressekonferenz mit FPÖ-Bundesparteivorsitzenden 
> Heinz-Christian Strache in Langenlois" 
> This generates the following queries
> (1) process Token 5: FPÖ
>   >> searchStrings [FPÖ, Bundesparteivorsitzenden]
>   << 0: FPÖ[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for 
> http://rdf.freebase.com/ns/m.013vy8
> (2) process Token 5: Bundesparteivorsitzenden
>   >> searchStrings [Bundesparteivorsitzenden, Heinz]
>  << 0: Heinz[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for 
> http://rdf.freebase.com/ns/m.0c5y96
> (3) process Token 7: Christian
>   >> searchStrings [Christian, Strache]
>  << 0: Heinz-Christian Strache[m=FULL,s=2,c=2(1.0)/3] 
> score=0.6666666666666666[l=0.6666666666666666,t=1.0] for 
> http://rdf.freebase.com/ns/m.08lfdk
> resulting in a situation where Heinz is linked to an other Entity while 
> Heinz-Christian Strache - while completely matching the text - is only linked 
> with "Christian Strache" AND a lower confidence!
> The issue is that search (2) issued for the Token "Bundesparteivorsitzenden" 
> MUST NOT suggest an Entity that does not match the currently active Token. 
> Because this is the case in the given Example "Heinz" is already consumed and 
> can not be linked with the expected Entity mention "Heinz-Christian Strache"
> This issue will add a rule to the Label <-> Text matching that an Label MUST 
> match the currently active token in the text.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to