[jira] [Created] (STANBOL-1252) Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine

Rupert Westenthaler (JIRA) Thu, 09 Jan 2014 05:45:55 -0800

Rupert Westenthaler created STANBOL-1252:
--------------------------------------------


             Summary: Add support for MIN_FOUND_TOKENS to the Lucene FST 
Linking Engine
                 Key: STANBOL-1252
                 URL: https://issues.apache.org/jira/browse/STANBOL-1252
             Project: Stanbol
          Issue Type: Improvement
    Affects Versions: 0.12.0
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


The FST linking engine already allows to configure in percentage how much of a 
processable chunk (typically noun phrases) need to match so that a suggestion 
is accepted. This is done by using the 
"enhancer.engines.linking.minChunkMatchScore" property. The default is > 50%.

While this way of configuration is great for chunks created by 
NamedEntityAnnotations it is not always well suited for detected noun phrases 
as those may select larger sections of a sentence. E.g. "goalie Mathias Lange 
(Iserlohn Roosters)" will not match any Entity in a vocabulary as it contains 5 
matchable tokens but both the player "Mathias Lange" and the Team name 
"Iserlohn Roosters" do only represent two of them.

In such cases the configuration of a fixed lower limit of the number of 
(matchable) Tokens that need to match within a Chunk can be preferable.

For this configuration the FST linking engine will use the "Min Matched Tokens 
(enhancer.engines.linking.minFoundTokens)" property of the EntityLinker 
configuration.

The FST linking Engine will accept tokens the either confirm with 
"enhancer.engines.linking.minChunkMatchScore" or 
"enhancer.engines.linking.minFoundTokens".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (STANBOL-1252) Add support for MIN_FOUND_TOKENS to the Lucene FST Linking Engine

Reply via email to