Re: Relation extraction feature

Cristian Petroaca Thu, 13 Jun 2013 11:23:34 -0700

HI Rupert,

First of all thanks for the detailed suggestions.


2013/6/12 Rupert Westenthaler <[email protected]>

> Hi Cristian, all
>
> really interesting use case!
>
> In this mail I will try to give some suggestions on how this could
> work out. This suggestions are mainly based on experiences and lessons
> learned in the LIVE [2] project where we built an information system
> for the Olympic Games in Peking. While this Project excluded the
> extraction of Events from unstructured text (because the Olympic
> Information System was already providing event data as XML messages)
> the semantic search capabilities of this system where very similar as
> the one described by your use case.
>
> IMHO you are not only trying to extract relations, but a formal
> representation of the situation described by the text. So lets assume
> that the goal is to Annotate a Setting (or Situation) described in the
> text - a fise:SettingAnnotation.
>
> The DOLCE foundational ontology [1] gives some advices on how to model
> those. The important relation for modeling this Participation:
>
>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>
> where ..
>
>  * ED are Endurants (continuants): Endurants do have an identity so we
> would typically refer to them as Entities referenced by a setting.
> Note that this includes physical, non-physical as well as
> social-objects.
>  * PD are Perdurants (occurrents):  Perdurants are entities that
> happen in time. This refers to Events, Activities ...
>  * PC are Participation: It is an time indexed relation where
> Endurants participate in Perdurants
>
> Modeling this in RDF requires to define some intermediate resources
> because RDF does not allow for n-ary relations.
>
>  * fise:SettingAnnotation: It is really handy to define one resource
> being the context for all described data. I would call this
> "fise:SettingAnnotation" and define it as a sub-concept to
> fise:Enhancement. All further enhancement about the extracted Setting
> would define a "fise:in-setting" relation to it.
>
>  * fise:ParticipantAnnotation: Is used to annotate that Endurant is
> participating on a setting (fise:in-setting fise:SettingAnnotation).
> The Endurant itself is described by existing fise:TextAnnotaion (the
> mentions) and fise:EntityAnnotation (suggested Entities). Basically
> the fise:ParticipantAnnotation will allow an EnhancementEngine to
> state that several mentions (in possible different sentences) do
> represent the same Endurant as participating in the Setting. In
> addition it would be possible to use the dc:type property (similar as
> for fise:TextAnnotation) to refer to the role(s) of an participant
> (e.g. the set: Agent (intensionally performs an action) Cause
> (unintentionally e.g. a mud slide), Patient (a passive role in an
> activity) and Instrument (aids an process)), but I am wondering if one
> could extract those information.
>
> * fise:OccurrentAnnotation: is used to annotate a Perdurant in the
> context of the Setting. Also fise:OccurrentAnnotation can link to
> fise:TextAnnotaion (typically verbs in the text defining the
> perdurant) as well as fise:EntityAnnotation suggesting well known
> Events in a knowledge base (e.g. a Election in a country, or an
> upraising ...). In addition fise:OccurrentAnnotation can define
> dc:has-participant links to fise:ParticipantAnnotation. In this case
> it is explicitly stated hat an Endurant (the
> fise:ParticipantAnnotation) involved in this Perturant (the
> fise:OccurrentAnnotation). As Occurrences are temporal indexed this
> annotation should also support properties for defining the
> xsd:dateTime for the start/end.
>
>
> Indeed, an event based data structure makes a lot of sense with the remark
that you probably won't be able to always extract the date for a given
setting(situation).
There are 2 thing which are unclear though.

1. Perdurant : You could have situations in which the object upon which the
Subject ( or Endurant ) is acting is not a transitory object ( such as an
event, activity ) but rather another Endurant. For example we can have the
phrase "USA invades Irak" where "USA" is the Endurant ( Subject ) which
performs the action of "invading" on another Eundurant, namely "Irak".

2. Where does the verb, which links the Subject and the Object come into
this? I imagined that the Endurant would have a dc:"property" where the
property = verb which links to the Object in noun form. For example take
again the sentence "USA invades Irak". You would have the "USA" Entity with
dc:invader which points to the Object "Irak". The Endurant would have as
many dc:"property" elements as there are verbs which link it to an Object.

### Consuming the data:
>
> I think this model should be sufficient for use-cases as described by you.
>
> Users would be able to consume data on the setting level. This can be
> done my simple retrieving all fise:ParticipantAnnotation as well as
> fise:OccurrentAnnotation linked with a setting. BTW this was the
> approach used in LIVE [2] for semantic search. It allows queries for
> Settings that involve specific Entities e.g. you could filter for
> Settings that involve a {Person}, activities:Arrested and a specific
> {Upraising}. However note that with this approach you will get results
> for Setting where the {Person} participated and an other person was
> arrested.
>
> An other possibility would be to process enhancement results on the
> fise:OccurrentAnnotation. This would allow to a much higher
> granularity level (e.g. it would allow to correctly answer the query
> used as an example above). But I am wondering if the quality of the
> Setting extraction will be sufficient for this. I have also doubts if
> this can be still realized by using semantic indexing to Apache Solr
> or if it would be better/necessary to store results in a TripleStore
> and using SPARQL for retrieval.
>
> The methodology and query language used by YAGO [3] is also very
> relevant for this (especially note chapter 7 SPOTL(X) Representation).
>
> An other related Topic is the enrichment of Entities (especially
> Events) in knowledge bases based on Settings extracted form Documents.
> As per definition - in DOLCE - Perdurants are temporal indexed. That
> means that at the time when added to a knowledge base they might still
> be in process. So the creation, enriching and refinement of such
> Entities in a the knowledge base seams to be critical for a System
> like described in your use-case.
>
> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
> <[email protected]> wrote:
> >
> > First of all I have to mention that I am new in the field of semantic
> > technologies, I've started to read about them in the last 4-5
> months.Having
> > said that I have a high level overview of what is a good approach to
> solve
> > this problem. There are a number of papers on the internet which describe
> > what steps need to be taken such as : named entity recognition,
> > co-reference resolution, pos tagging and others.
>
> The Stanbol NLP processing module currently only supports sentence
> detection, tokenization, POS tagging, Chunking, NER and lemma. support
> for co-reference resolution and dependency trees is currently missing.
>
> Stanford NLP is already integrated with Stanbol [4]. At the moment it
> only supports English, but I do already work to include the other
> supported languages. Other NLP framework that is already integrated
> with Stanbol are Freeling [5] and Talismane [6]. But note that for all
> those the integration excludes support for co-reference and dependency
> trees.
>
> Anyways I am confident that one can implement a first prototype by
> only using Sentences and POS tags and - if available - Chunks (e.g.
> Noun phrases).
>
>
I assume that in the Stanbol context, a feature like Relation extraction
would be implemented as an EnhancementEngine?
What kind of effort would be required for a co-reference resolution tool
integration into Stanbol?

At this moment I'll be focusing on 2 aspects:

1. Determine the best data structure to encapsulate the extracted
information. I'll take a closer look at Dolce.
2. Determine how should all of this be integrated into Stanbol.

Thanks

Hope this helps to bootstrap this discussion
> best
> Rupert
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Reply via email to