[ 
https://issues.apache.org/jira/browse/STANBOL-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-1117.
------------------------------------------

    Resolution: Won't Fix

Further investigation have shown that the using the POS tag to improve the 
selection of Tokens used to lookup Entities within the vocabulary is not 
feasible. The Reasons are:

* configuration of "chunkable" POS tags: It depends often on the specific case 
if a POS tag should be considered as "chinkable" or not
* negative impact on the linking support in cases where there are no POS tags 
present.

In case users do want a functionality like that they should implement an Engine 
that adds Chunk annotations based on an algorithm over POS tags.
                
> Use POS tag information for better selection of search tokens for 
> EntityLookups
> -------------------------------------------------------------------------------
>
>                 Key: STANBOL-1117
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1117
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancement Engines
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Currently EntityLinking determines Tokens used for lookups in the controlled 
> vocabularies like follows
> * start from a "linkable" Token
> * search surrounding Tokens for other "linkable" or "matchable" Tokens
> * until "Max Search Token Distance" (default 3 Tokens) or
> * more than one non "matchable" Token was found
> * Max Search Tokens (default 2 Tokens) are selected but
> * never use Tokes earlier as the last consumed (already linked) tokens
> * in the case of explicitly annotated Chunks the selection of search tokens 
> is in addition limited by those chunks
> This Issue will try to improve this algorithm by considering
> * "Linkable" and "matchable" Tokens
> * Tokens with "chunkable" POS annotations
> when selecting search Tokens. This will allow better selection of search 
> tokens in cases where not Chunker (NounPhrase detection and/or NER) are 
> present.
> With this in place it need to be checked if increasing the default "Max 
> Search Tokens" could lead to better results and possible performance - if one 
> query could be used to link multiple Entities for non overlapping spans).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to