Rupert Westenthaler created STANBOL-1403:
--------------------------------------------

             Summary: Add PLAIN linking mode to the FST linking engine
                 Key: STANBOL-1403
                 URL: https://issues.apache.org/jira/browse/STANBOL-1403
             Project: Stanbol
          Issue Type: Improvement
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


The Lucene FST linking engine uses a similar linking process as the entity 
linking engine. This means that NLP processing results are used to determine 
"Linkable" and "Matchable" tokens in the text. "Linkable" tokens are than used 
to initiate vocabulary lookups and "Linkable" and "Matchable" tokens are used 
to check if labels of entities do actually match with the text.

This issue will introduce a new linking mode where the FST linking engine that 
will try to link every singe word in the text. Instead of using NLP processing 
results this will simple use the Solr Analyzer of the configured field.

The PLAIN mode is intended to be used in cases:

* where no NLP support is available
* for vocabularies that do contain entities that appear in text with tokens 
other than nouns (e.g. a vocabulary that contains activities)

The PLAIN mode will not work in cases where users have used ProperNoun mode 
with big vocabularies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to