[jira] [Commented] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Cristian Petroaca (JIRA) Sun, 08 Feb 2015 10:05:41 -0800

    [ 
https://issues.apache.org/jira/browse/STANBOL-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14311455#comment-14311455
 ]


Cristian Petroaca commented on STANBOL-1279:
--------------------------------------------

Hey Rupert,

I added the latest entity coref engine code with the following changes:

- enhanced the ./build_yago_dbpedia_labels.sh script to check for downloaded 
archives and to output a better status. There is no way around the 7za command 
though. I need it to unzip the 7z archives.
- moved the spatial and org membership attributes from config files inside the 
jar to OSGI attributes. They are quite a few but it does not look horribly 
crowded in the GUI.
- added entity-coref-dbpedia data bundle.
- created new dbpedia index that contains the oraganisational membership 
attributes such as :occupation, :associatedBands and :employer. I also put it 
up on wetransfer. You should receive a mail with the download link.

Basically the engine works with 3 types of co-referencing:
1. Spatial: ex Angela Merkel -> The German Chancellor
2. Organisation membership : ex Mick Jagger -> The Rolling Stones singer. 
3. Class based - when the class has more that 2 words in it : ex Boris Becker 
-> The former tennis player.

 

> Named Entity co-reference resolution engine based on yago/dbpedia contextual 
> information
> ----------------------------------------------------------------------------------------
>
>                 Key: STANBOL-1279
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1279
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancement Engines
>            Reporter: Cristian Petroaca
>            Assignee: Rupert Westenthaler
>              Labels: co-reference, dbpedia, entity, named, yago
>         Attachments: named_entity_coref_ver_1.patch, 
> named_entity_coref_ver_2.patch, named_entity_coref_ver_3.patch
>
>
> Develop an enhancement engine that will perform co-reference resolution of 
> Named Entities in a given text. The co-references will be noun phrases which 
> refer to those Named Entities by having a minimal set of attributes which 
> match contextual information (yago rdf:type and dbpedia spatial and object 
> function giving info - more on this below) from dbpedia/yago for that Named 
> Entity.
> We have the following text as an example : "Microsoft has posted its 2013 
> earnings. The software company did better than expected. ... The 
> Redmond-based company will hire 500 new developers this year."
> The enhancement engine will link "Microsoft" with "The software company" and 
> "The Redmond-based company".
> Below there are the steps necessary in order to extract the co-references.
> Named Entity extraction 
> ================== 
> Extract all Named Entities from the given text. If there are no Named 
> Entities then the process stops here.
> Noun Phrases extraction 
> ===================
> Select all noun phrases after the first Named Entity that have:
> + a determinate pos which implies reference to an entity local to the text, 
> such as "the, this, these") but not "another, every", etc which implies a 
> reference to an entity outside of the text.
> + at least another noun aside from the main required noun which further 
> describes it. For example I will not count "The company" as being a 
> legitimate candidate since this could create a lot of false positives by 
> considering the double meaning of some words such as "in the company of good 
> people".
> All noun phrases need to be lemmatized in case there are any plurals.
> This step should have different logic implemented for different languages.
> This step ensures good recall.
>       
> Noun Phrases matching
> ===================
> This step tries to match the previously selected noun phrases to the Named 
> Entities from step 1 and establish the co-references.
> For every noun phrase the following rules will be applied:
> Yago:class matching
> --------------------------
> For each NER prior to the current noun phrase in the text match the 
> yago:class label to the contents of the noun phrase. If there are no matches 
> then drop the current noun phrase.
> Group membership rules matching
> -------------------------------------------
> For each NER prior to the current noun phrase:
> + Spatial membership : the noun phrase is part of a LOCATION. 
> If the noun phrase contains a LOCATION or a demonym then check any location 
> properties of the matching NER. These properties will be part of a generic 
> ontology. For clarity I will describe the dbpedia extracted properties which 
> will be aligned to this generic ontology.
> If matching NER is a :
>     - person, match against :birthPlace, :region, :nationality
>     - organisation, match against :foundationPlace, :locationCity, :location, 
> :hometown
>     - place, match against :country, :subdivisionName, :location.
> Example: The Italian President, The Richmond-based company
> + Organisational membership : the NER is part of an ORGANISATION. 
> If the noun phrase contains an ORGANISATION then check the following 
> properties of the maching NER. These properties will be part of a generic 
> ontology. For clarity I will describe the dbpedia extracted properties which 
> will be aligned to this generic ontology.
> If matching NER is :
>     - person, match against :occupation, :associatedActs
>     - organisation : no dbpedia properties to match
>     - location : no dbpedia properties to match
> Example: The Microsoft executive, The Pink Floyd singer
> Functional description rules matching
> -----------------------------------------------
> The noun phrase describes what the NER does conceptually.
> If there are no NERs in the noun phrase then match the following properties 
> of the matching NER to the contents of the noun phrase (aside from the nouns 
> which are part of the yago:class) :
>    If NER is a:
>    - person : no dbpedia properties to match
>    - organisation : , match against :service, :industry, :genre
>    - location : no dbpedia properties to match
> Example:  The software company.
> If no matches were found for the current NER with rules "Group membership" 
> and "Functional description" rules then if the yago:class which matched has 
> more than 2 nouns then we also consider this a good co-reference but with a 
> lower confidence maybe.
> Ex: The former tennis player, the theoretical physicist.
> Co-references creation
> ==================
> Based on the number of nouns which matched from the previous step we create a 
> confidence level. The number of matched nouns cannot be lower than 2 and we 
> must have a yago:class match.
> For all NERs which got to this point, select the closest ones in the text to 
> the noun phrase which matched against the same properties (yago:class and 
> dbpedia) and mark them as co-references.
> The "Noun Phrases matching" and "Co-references creation" steps are designed 
> to filter out all bad co-references and ensure good precision.
>       



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STANBOL-1279) Named Entity co-reference resolution engine based on yago/dbpedia contextual information

Reply via email to