[
https://issues.apache.org/jira/browse/STANBOL-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-1102.
------------------------------------------
Resolution: Fixed
fixed with http://svn.apache.org/r1496359 (as part of STANBOL-1114)
> EntityLinking MUST only accept Suggestions for the current active Token
> -----------------------------------------------------------------------
>
> Key: STANBOL-1102
> URL: https://issues.apache.org/jira/browse/STANBOL-1102
> Project: Stanbol
> Issue Type: Sub-task
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> With the "Max Search Tokens (enhancer.engines.linking.maxSearchTokens)"
> configuration the EntityLinking Engine does support OR queries for multiple
> linkable/matchable tokens to the controlled vocabulary (default=2).
> This feature ensures that Entities that do match longer section in the text
> are higher ranked. This is especially important for bigger vocabularies
> and/or common tokens within the vocabulary as the EntityLinking only
> considers the top 10 (or 3 * max suggestions) query results.
> However in case multiple Tokens are used for searches there might be
> suggestions that do match some tokens in the Text, but not the currently
> active one. Currently those suggestions are taken into account what can cause
> unwanted states, like the one described in the following Example:
> "Bei einer gmeinsamen Pressekonferenz mit FPÖ-Bundesparteivorsitzenden
> Heinz-Christian Strache in Langenlois"
> This generates the following queries
> (1) process Token 5: FPÖ
> >> searchStrings [FPÖ, Bundesparteivorsitzenden]
> << 0: FPÖ[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for
> http://rdf.freebase.com/ns/m.013vy8
> (2) process Token 5: Bundesparteivorsitzenden
> >> searchStrings [Bundesparteivorsitzenden, Heinz]
> << 0: Heinz[m=FULL,s=1,c=1(1.0)/1] score=1.0[l=1.0,t=1.0] for
> http://rdf.freebase.com/ns/m.0c5y96
> (3) process Token 7: Christian
> >> searchStrings [Christian, Strache]
> << 0: Heinz-Christian Strache[m=FULL,s=2,c=2(1.0)/3]
> score=0.6666666666666666[l=0.6666666666666666,t=1.0] for
> http://rdf.freebase.com/ns/m.08lfdk
> resulting in a situation where Heinz is linked to an other Entity while
> Heinz-Christian Strache - while completely matching the text - is only linked
> with "Christian Strache" AND a lower confidence!
> The issue is that search (2) issued for the Token "Bundesparteivorsitzenden"
> MUST NOT suggest an Entity that does not match the currently active Token.
> Because this is the case in the given Example "Heinz" is already consumed and
> can not be linked with the expected Entity mention "Heinz-Christian Strache"
> This issue will add a rule to the Label <-> Text matching that an Label MUST
> match the currently active token in the text.
--
This message was sent by Atlassian JIRA
(v6.1#6144)