Rupert Westenthaler created STANBOL-1091:
--------------------------------------------

             Summary: EntityLinking Engine should not process the same tokens 
twice
                 Key: STANBOL-1091
                 URL: https://issues.apache.org/jira/browse/STANBOL-1091
             Project: Stanbol
          Issue Type: Improvement
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Minor


The EntityLinking Engine currently processes the text based on Sections 
(typically Sentences - if present). However in cases where multiple NLP 
framework do process the parsed text it might happen that Sentence annotations 
are overlapping. In such cases the EntityLinkingEngine would first process the 
Sentence with the earlier start and/or later end position. But it would also 
process the other sentence that is (partially) covered by the other one. 
Because of that Tokens and Chunks contained in two (or more) overlapping 
Sentence annotations will be processed twice.

To avoid this the EntityLinking Engine should keep track of Tokens that where 
already processed and just ignore already processed parts of overlapping 
sentences.

NOTE: This will not have any affects on the Entity Linking Results. However it 
will prevent unnecessary processing steps in cases as described above.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to