Hi Cristian, I think this is a very useful feature in Stanbol and ideally it should be implemented as a separate enhancement engine.
In my GSoC project [1] (FOAF correlation based entity disambiguation engine) I created a somewhat similar enhancement engine which calculates the correlations between the the named entities in the text and use it for entity-disambiguation purpose. The engine's objective is to increase the confidence of suggested Entity-Annotations from previous enhancement engines by processing the correlated URI references from the entities. In my project, the coreferences were analysed by processing all the URI References from the EntityAnnotation which are suggested by other enhancement engines in the chain. But in this project I think POS based approach should be adopted to identify entity-references from phrases like : the company, them, etc. Thanks, Dileepa [1] https://github.com/dileepajayakody/foaf-disambiguation On Thu, Jan 30, 2014 at 3:03 PM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Hi, > > One of the necessary steps for implementing the Event extraction Engine > feature : https://issues.apache.org/jira/browse/STANBOL-1121 is to have > coreference resolution in the given text. This is provided now via the > stanford-nlp project but as far as I saw this module is performing mostly > pronomial (He, She) or nominal (Barack Obama and Mr. Obama) coreference > resolution. > > In order to get more coreferences from the text I though of creating some > logic that would detect this kind of coreference : > "Apple reaches new profit heights. The software company just announced its > 2013 earnings." > Here "The software company" obviously refers to "Apple". > So I'd like to detect coreferences of Named Entities which are of the > rdf:type of the Named Entity , in this case "company" and also have > attributes which can be found in the dbpedia categories of the named > entity, in this case "software". > > The detection of coreferences such as "The software company" in the text > would also be done by either using the new Pos Tag Based Phrase extraction > Engine (noun phrases) or by using a dependency tree of the sentence and > picking up only subjects or objects. > > At this point I'd like to know if this kind of logic would be useful as a > separate Enhancement Engine (in case the precision and recall are good > enough) in Stanbol? > > Thanks, > Cristian >