[ 
https://issues.apache.org/jira/browse/STANBOL-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766445#comment-13766445
 ] 

Antonio David Pérez Morales commented on STANBOL-1157:
------------------------------------------------------

The Freebase Disambiguation Engine implements the above algorithm but the first 
point (local score) which is not implemented in this version.

The algorithm builds a subgraph from the whole Freebase graph only for the 
entities returned after the NLP and Entity linking process, and the relations 
between them.

Using the Entity Annotations for each Text Annotation, it builds all the 
possible solutions for the text to enhance. It means, all the possible tuples 
result of combining the entities in each set of entity annotations (for each 
text annotation).

The searched solution is the tuple minimizing the distance in the graph between 
every pair of entities in the tuple. Minimal distance means higher 
disambiguation score.

The engine can be downloaded from 
https://github.com/adperezmorales/gsoc-freebase-disambiguation-engine/tree/master/gsoc-freebase-disambiguation-engine
                
> Freebase Disambiguation Algorithm
> ---------------------------------
>
>                 Key: STANBOL-1157
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1157
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancement Engines, Enhancer, Entityhub
>            Reporter: Rafa Haro
>             Fix For: 0.12.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The disambiguation algorithm should take into account a local disambiguation 
> score (comparing in some way the document context with the contexts provided 
> by Wikilinks resource) and a global disambiguation score computed by a graph 
> based algorithm using the Freebase graph imported in a Neo4j database. Each 
> disambiguation score would have a different weight in the final 
> disambiguation store for each entity. The algorithm's steps, for each 
> TextAnnotation, can be the following:
> 1. Local score: for each EntityAnnotation, retrieves from Wikilinks database 
> all the contexts associated to the referenced entity. Compare (similarity, 
> distance....) the mention context (selected-context) with the wikilinks 
> contexts.
> 2. Global score: build a subgraph with all the possible entities and its 
> relations in Freebase. Extract a set of possibles solutions from such graph 
> (note: a solution should include only one entity annotation for each text 
> annotation). Compute the Dijsktra distance between each pair of entities 
> belonging to a possible solution. 
> 3. Weights normalization and confidence values refinement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to