[jira] [Updated] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Cristian Petroaca (JIRA) Sat, 29 Mar 2014 09:12:06 -0700

     [ 
https://issues.apache.org/jira/browse/STANBOL-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cristian Petroaca updated STANBOL-1279:
---------------------------------------

    Description: 
Develop an enhancement engine that will perform co-reference resolution of 
Named Entities in a given text. The co-references will be noun phrases which 
refer to those Named Entities by having a minimal set of attributes which match 
contextual information (yago rdf:type and dbpedia spatial and object function 
giving info - more on this below) from dbpedia/yago for that Named Entity.

We have the following text as an example : "Microsoft has posted its 2013 
earnings. The software company did better than expected. ... The Redmond-based 
company will hire 500 new developers this year."
The enhancement engine will link "Microsoft" with "The software company" and 
"The Redmond-based company".

Below there are the steps necessary in order to extract the co-references.

Named Entity extraction 
================== 
Extract all Named Entities from the given text. If there are no Named Entities 
then the process stops here.

Noun Phrases extraction 
===================
Select all noun phrases after the first Named Entity that have:
+ a determinate pos which implies reference to an entity local to the text, 
such as "the, this, these") but not "another, every", etc which implies a 
reference to an entity outside of the text.
+ at least another noun aside from the main required noun which further 
describes it. For example I will not count "The company" as being a legitimate 
candidate since this could create a lot of false positives by considering the 
double meaning of some words such as "in the company of good people".

All noun phrases need to be lemmatized in case there are any plurals.
This step should have different logic implemented for different languages.
This step ensures good recall.
        
Noun Phrases matching
===================
This step tries to match the previously selected noun phrases to the Named 
Entities from step 1 and establish the co-references.
For every noun phrase the following rules will be applied:

Yago:class matching
--------------------------
For each NER prior to the current noun phrase in the text match the yago:class 
label to the contents of the noun phrase. If there are no matches then drop the 
current noun phrase.

Group membership rules matching
-------------------------------------------
For each NER prior to the current noun phrase:

+ Spatial membership : the noun phrase is part of a LOCATION. 
If the noun phrase contains a LOCATION or a demonym then check any location 
properties of the matching NER. These properties will be part of a generic 
ontology. For clarity I will describe the dbpedia extracted properties which 
will be aligned to this generic ontology.

If matching NER is a :
    - person, match against :birthPlace, :region, :nationality
    - organisation, match against :foundationPlace, :locationCity, :location, 
:hometown
    - place, match against :country, :subdivisionName, :location.

Example: The Italian President, The Richmond-based company

+ Organisational membership : the NER is part of an ORGANISATION. 
If the noun phrase contains an ORGANISATION then check the following properties 
of the maching NER. These properties will be part of a generic ontology. For 
clarity I will describe the dbpedia extracted properties which will be aligned 
to this generic ontology.

If matching NER is :
    - person, match against :occupation, :associatedActs
    - organisation : no dbpedia properties to match
    - location : no dbpedia properties to match

Example: The Microsoft executive, The Pink Floyd singer

Functional description rules matching
-----------------------------------------------
The noun phrase describes what the NER does conceptually.
If there are no NERs in the noun phrase then match the following properties of 
the matching NER to the contents of the noun phrase (aside from the nouns which 
are part of the yago:class) :

   If NER is a:
   - person : no dbpedia properties to match
   - organisation : , match against :service, :industry, :genre
   - location : no dbpedia properties to match

Example:  The software company.

If no matches were found for the current NER with rules "Group membership" and 
"Functional description" rules then if the yago:class which matched has more 
than 2 nouns then we also consider this a good co-reference but with a lower 
confidence maybe.

Ex: The former tennis player, the theoretical physicist.

Co-references creation
==================
Based on the number of nouns which matched from the previous step we create a 
confidence level. The number of matched nouns cannot be lower than 2 and we 
must have a yago:class match.
For all NERs which got to this point, select the closest ones in the text to 
the noun phrase which matched against the same properties (yago:class and 
dbpedia) and mark them as co-references.

The "Noun Phrases matching" and "Co-references creation" steps are designed to 
filter out all bad co-references and ensure good precision.
        

  was:
Develop an enhancement engine that will perform co-reference resolution of 
Named Entities in a given text. The co-references will be noun phrases which 
refer to those Named Entities by having a minimal set of attributes which match 
contextual information (yago rdf:type and dbpedia spatial and object function 
giving info - more on this below) from dbpedia/yago for that Named Entity.

We have the following text as an example : "Microsoft has posted its 2013 
earnings. The software company did better than expected. ... The Redmond-based 
company will hire 500 new developers this year."
The enhancement engine will link "Microsoft" with "The software company" and 
"The Redmond-based company".

Below there are the steps necessary in order to extract the co-references.

Named Entity extraction 
================== 
Extract all Named Entities from the given text. If there are no Named Entities 
then the process stops here.

Noun Phrases extraction 
===================
Select all noun phrases after the first Named Entity that have:
+ a determinate pos which implies reference to an entity local to the text, 
such as "the, this, these") but not "another, every", etc which implies a 
reference to an entity outside of the text.
+ at least another noun aside from the main required noun which further 
describes it. For example I will not count "The company" as being a legitimate 
candidate since this could create a lot of false positives by considering the 
double meaning of some words such as "in the company of good people".

All noun phrases need to be lemmatized in case there are any plurals.
This step should have different logic implemented for different languages.
This step ensures good recall.
        
Noun Phrases matching
===================
This step tries to match the previously selected noun phrases to the Named 
Entities from step 1 and establish the co-references.
For every noun phrase the following rules will be applied:

Yago:class matching
--------------------------
For each NER prior to the current noun phrase in the text match the yago:class 
label to the contents of the noun phrase. If there are no matches then drop the 
current noun phrase.

Group membership rules matching
-------------------------------------------

+ Spatial membership : the noun phrase is part of a LOCATION. 
If the noun phrase contains a LOCATION or a demonym then check any location 
properties of the matching NER. These properties will be part of a generic 
ontology. For clarity I will describe the dbpedia extracted properties which 
will be aligned to this generic ontology.

If matching NER is a :
    - person, match against :birthPlace, :region, :nationality
    - organisation, match against :foundationPlace, :locationCity, :location, 
:hometown
    - place, match against :country, :subdivisionName, :location.

Example: The Italian President, The Richmond-based company

+ Organisational membership : the NER is part of an ORGANISATION. 
If the noun phrase contains an ORGANISATION then check the following properties 
of the maching NER. These properties will be part of a generic ontology. For 
clarity I will describe the dbpedia extracted properties which will be aligned 
to this generic ontology.

If matching NER is :
    - person, match against :occupation, :associatedActs
    - organisation : no dbpedia properties to match
    - location : no dbpedia properties to match

Example: The Microsoft executive, The Pink Floyd singer

Functional description rules matching
-----------------------------------------------
The noun phrase describes what the NER does conceptually.
If there are no NERs in the noun phrase then match the following properties of 
the matching NER to the contents of the noun phrase (aside from the nouns which 
are part of the yago:class) :

   If NER is a:
   - person : no dbpedia properties to match
   - organisation : , match against :service, :industry, :genre
   - location : no dbpedia properties to match

Example:  The software company.

If no matches were found for the current NER with rules "Group membership" and 
"Functional description" rules then if the yago:class which matched has more 
than 2 nouns then we also consider this a good co-reference but with a lower 
confidence maybe.

Ex: The former tennis player, the theoretical physicist.

Co-references creation
==================
Based on the number of nouns which matched from the previous step we create a 
confidence level. The number of matched nouns cannot be lower than 2 and we 
must have a yago:class match.
For all NERs which got to this point, select the closest ones in the text to 
the noun phrase which matched against the same properties (yago:class and 
dbpedia) and mark them as co-references.

The "Noun Phrases matching" and "Co-references creation" steps are designed to 
filter out all bad co-references and ensure good precision.
        


> Named Entity co-reference resolution engine based on yago/dbpedia contextual 
> information
> ----------------------------------------------------------------------------------------
>
>                 Key: STANBOL-1279
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1279
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancement Engines
>            Reporter: Cristian Petroaca
>              Labels: co-reference, dbpedia, entity, named, yago
>
> Develop an enhancement engine that will perform co-reference resolution of 
> Named Entities in a given text. The co-references will be noun phrases which 
> refer to those Named Entities by having a minimal set of attributes which 
> match contextual information (yago rdf:type and dbpedia spatial and object 
> function giving info - more on this below) from dbpedia/yago for that Named 
> Entity.
> We have the following text as an example : "Microsoft has posted its 2013 
> earnings. The software company did better than expected. ... The 
> Redmond-based company will hire 500 new developers this year."
> The enhancement engine will link "Microsoft" with "The software company" and 
> "The Redmond-based company".
> Below there are the steps necessary in order to extract the co-references.
> Named Entity extraction 
> ================== 
> Extract all Named Entities from the given text. If there are no Named 
> Entities then the process stops here.
> Noun Phrases extraction 
> ===================
> Select all noun phrases after the first Named Entity that have:
> + a determinate pos which implies reference to an entity local to the text, 
> such as "the, this, these") but not "another, every", etc which implies a 
> reference to an entity outside of the text.
> + at least another noun aside from the main required noun which further 
> describes it. For example I will not count "The company" as being a 
> legitimate candidate since this could create a lot of false positives by 
> considering the double meaning of some words such as "in the company of good 
> people".
> All noun phrases need to be lemmatized in case there are any plurals.
> This step should have different logic implemented for different languages.
> This step ensures good recall.
>       
> Noun Phrases matching
> ===================
> This step tries to match the previously selected noun phrases to the Named 
> Entities from step 1 and establish the co-references.
> For every noun phrase the following rules will be applied:
> Yago:class matching
> --------------------------
> For each NER prior to the current noun phrase in the text match the 
> yago:class label to the contents of the noun phrase. If there are no matches 
> then drop the current noun phrase.
> Group membership rules matching
> -------------------------------------------
> For each NER prior to the current noun phrase:
> + Spatial membership : the noun phrase is part of a LOCATION. 
> If the noun phrase contains a LOCATION or a demonym then check any location 
> properties of the matching NER. These properties will be part of a generic 
> ontology. For clarity I will describe the dbpedia extracted properties which 
> will be aligned to this generic ontology.
> If matching NER is a :
>     - person, match against :birthPlace, :region, :nationality
>     - organisation, match against :foundationPlace, :locationCity, :location, 
> :hometown
>     - place, match against :country, :subdivisionName, :location.
> Example: The Italian President, The Richmond-based company
> + Organisational membership : the NER is part of an ORGANISATION. 
> If the noun phrase contains an ORGANISATION then check the following 
> properties of the maching NER. These properties will be part of a generic 
> ontology. For clarity I will describe the dbpedia extracted properties which 
> will be aligned to this generic ontology.
> If matching NER is :
>     - person, match against :occupation, :associatedActs
>     - organisation : no dbpedia properties to match
>     - location : no dbpedia properties to match
> Example: The Microsoft executive, The Pink Floyd singer
> Functional description rules matching
> -----------------------------------------------
> The noun phrase describes what the NER does conceptually.
> If there are no NERs in the noun phrase then match the following properties 
> of the matching NER to the contents of the noun phrase (aside from the nouns 
> which are part of the yago:class) :
>    If NER is a:
>    - person : no dbpedia properties to match
>    - organisation : , match against :service, :industry, :genre
>    - location : no dbpedia properties to match
> Example:  The software company.
> If no matches were found for the current NER with rules "Group membership" 
> and "Functional description" rules then if the yago:class which matched has 
> more than 2 nouns then we also consider this a good co-reference but with a 
> lower confidence maybe.
> Ex: The former tennis player, the theoretical physicist.
> Co-references creation
> ==================
> Based on the number of nouns which matched from the previous step we create a 
> confidence level. The number of matched nouns cannot be lower than 2 and we 
> must have a yago:class match.
> For all NERs which got to this point, select the closest ones in the text to 
> the noun phrase which matched against the same properties (yago:class and 
> dbpedia) and mark them as co-references.
> The "Noun Phrases matching" and "Co-references creation" steps are designed 
> to filter out all bad co-references and ensure good precision.
>       



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Reply via email to