[jira] [Created] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Cristian Petroaca (JIRA) Sun, 09 Feb 2014 04:15:55 -0800

Cristian Petroaca created STANBOL-1279:
------------------------------------------


             Summary: Named Entity co-reference resolution engine based on 
yago/dbpedia contextual information
                 Key: STANBOL-1279
                 URL: https://issues.apache.org/jira/browse/STANBOL-1279
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancement Engines
            Reporter: Cristian Petroaca


Develop an enhancement engine that will perform co-reference resolution of 
Named Entities in a given text. The co-references will be noun phrases which 
refer to those Named Entities by having a minimal set of attributes which match 
contextual information (yago rdf:type and dbpedia spatial and object function 
giving info - more on this below) from dbpedia/yago for that Named Entity.

We have the following text as an example : "Microsoft has posted its 2013 
earnings. The software company did better than expected. ... The Redmond-based 
company will hire 500 new developers this year."
The enhancement engine will link "Microsoft" with "The software company" and 
"The Redmond-based company".

We will describe below the mechanism for perfoming the resolution :

If we have any Named Entities in the text then :

1. Select all noun phrases after the first Named Entity that have:
        a. a determinate pos which implies reference to an entity local to the 
text, such as "the, this, these") but not "another, every", etc which implies a 
reference to an entity outside of the text.
        b. at least another noun aside from the main required noun which 
further describes it. For example I will not count "The company" as being a 
legitimate candidate since this could create a lot of false positives by 
considering the double meaning of some words such as "in the company of good 
people".
        
        This step ensures good recall.
        
2. Match any noun phrase selected above with all Named Entities prior to it in 
the text.
   
   The core matching mechanism gets all nouns in the noun phrase and compares 
them with the yago rdf:type of the Named Entity. For example we will compare 
"software company" in the example above with any yagp rdf type for "Microsoft" 
which in our case will contain the category 
"Software_companies_of_the_United_States" . Based on the result with the most 
matches we can create a confidence level and link the noun phrase with the best 
matched named entity.
   
   Before the matching is done we need to have the yago rdf type values 
lemmatized and pos tagged so that any plural form mismatches can be avoided (as 
can be seen from the example above) and the non-noun words such as prepositions 
to be ignored. - at the moment it is unclear to me how to best make this happen.
   
   Besides from the core matching mechanism we will also have the following 
types of matches :
        a. Spatial - if a noun phrase contains a Location entity then we can 
also match any spatial dbpedia attributes in the Named Entity such as 
dbpedia-owl:locationCity for Organizations or dbpedia-owl:birthPlace, 
dbpedia-owl:region for Persons and dbpedia-owl:country for Locations.
            b. Based on what function they have - check the given nouns against 
the function describing properties in dbpedia such as : dbpedia-owl:profession, 
dbpedia-owl:occupation for Persons or dbpedia-owl:industry, dbpprop:services 
for Organizations.
                
                For both of these types of matches we first need to have the 
main noun of the noun phrase be matched with the rdf:type from yago.
                
                
        This step is designed to filter out all bad co-references and ensure 
good precision.
        
As an additional note if there are multiple named entities which can match a 
certain noun phrase then link the noun phrase with the closest named entity 
prior to it in the text.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Reply via email to