Re: Relation extraction feature

Cristian Petroaca Thu, 27 Jun 2013 06:13:32 -0700

Sorry, I meant the Stanbol NLP API, not Stanford in my previous e-mail. By
the way, does Open NLP have the ability to build dependency trees?



2013/6/23 Cristian Petroaca <[email protected]>

> Hi Rupert,
>
> I created jira https://issues.apache.org/jira/browse/STANBOL-1121.
> As you suggested I would start with extending the Stanford NLP with
> co-reference resolution but I think also with dependency trees because I
> also need to know the Subject of the sentence and the object that it
> affects, right?
>
> Given that I need to extend the Stanford NLP API in Stanbol for
> co-reference and dependency trees, how do I proceed with this? Do I create
> 2 new sub-tasks to the already opened Jira? After that can I start
> implementing on my local copy of Stanbol and when I'm done I'll send you
> guys the patch fo review?
>
> Regards,
> Cristian
>
>
> 2013/6/18 Rupert Westenthaler <[email protected]>
>
>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>> <[email protected]> wrote:
>> > Hi Rupert,
>> >
>> > Agreed on the SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>> > data structure.
>> >
>> > Should I open up a Jira for all of this in order to encapsulate this
>> > information and establish the goals and these initial steps towards
>> these
>> > goals?
>>
>> Yes please. A JIRA issue for this work would be great.
>>
>> > How should I proceed further? Should I create some design documents that
>> > need to be reviewed?
>>
>> Usually it is the best to write design related text directly in JIRA
>> by using Markdown [1] syntax. This will allow us later to use this
>> text directly for the documentation on the Stanbol Webpage.
>>
>> best
>> Rupert
>>
>>
>> [1] http://daringfireball.net/projects/markdown/
>> >
>> > Regards,
>> > Cristian
>> >
>> >
>> > 2013/6/17 Rupert Westenthaler <[email protected]>
>> >
>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>> >> <[email protected]> wrote:
>> >> > HI Rupert,
>> >> >
>> >> > First of all thanks for the detailed suggestions.
>> >> >
>> >> > 2013/6/12 Rupert Westenthaler <[email protected]>
>> >> >
>> >> >> Hi Cristian, all
>> >> >>
>> >> >> really interesting use case!
>> >> >>
>> >> >> In this mail I will try to give some suggestions on how this could
>> >> >> work out. This suggestions are mainly based on experiences and
>> lessons
>> >> >> learned in the LIVE [2] project where we built an information system
>> >> >> for the Olympic Games in Peking. While this Project excluded the
>> >> >> extraction of Events from unstructured text (because the Olympic
>> >> >> Information System was already providing event data as XML messages)
>> >> >> the semantic search capabilities of this system where very similar
>> as
>> >> >> the one described by your use case.
>> >> >>
>> >> >> IMHO you are not only trying to extract relations, but a formal
>> >> >> representation of the situation described by the text. So lets
>> assume
>> >> >> that the goal is to Annotate a Setting (or Situation) described in
>> the
>> >> >> text - a fise:SettingAnnotation.
>> >> >>
>> >> >> The DOLCE foundational ontology [1] gives some advices on how to
>> model
>> >> >> those. The important relation for modeling this Participation:
>> >> >>
>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>> >> >>
>> >> >> where ..
>> >> >>
>> >> >>  * ED are Endurants (continuants): Endurants do have an identity so
>> we
>> >> >> would typically refer to them as Entities referenced by a setting.
>> >> >> Note that this includes physical, non-physical as well as
>> >> >> social-objects.
>> >> >>  * PD are Perdurants (occurrents):  Perdurants are entities that
>> >> >> happen in time. This refers to Events, Activities ...
>> >> >>  * PC are Participation: It is an time indexed relation where
>> >> >> Endurants participate in Perdurants
>> >> >>
>> >> >> Modeling this in RDF requires to define some intermediate resources
>> >> >> because RDF does not allow for n-ary relations.
>> >> >>
>> >> >>  * fise:SettingAnnotation: It is really handy to define one resource
>> >> >> being the context for all described data. I would call this
>> >> >> "fise:SettingAnnotation" and define it as a sub-concept to
>> >> >> fise:Enhancement. All further enhancement about the extracted
>> Setting
>> >> >> would define a "fise:in-setting" relation to it.
>> >> >>
>> >> >>  * fise:ParticipantAnnotation: Is used to annotate that Endurant is
>> >> >> participating on a setting (fise:in-setting fise:SettingAnnotation).
>> >> >> The Endurant itself is described by existing fise:TextAnnotaion (the
>> >> >> mentions) and fise:EntityAnnotation (suggested Entities). Basically
>> >> >> the fise:ParticipantAnnotation will allow an EnhancementEngine to
>> >> >> state that several mentions (in possible different sentences) do
>> >> >> represent the same Endurant as participating in the Setting. In
>> >> >> addition it would be possible to use the dc:type property (similar
>> as
>> >> >> for fise:TextAnnotation) to refer to the role(s) of an participant
>> >> >> (e.g. the set: Agent (intensionally performs an action) Cause
>> >> >> (unintentionally e.g. a mud slide), Patient (a passive role in an
>> >> >> activity) and Instrument (aids an process)), but I am wondering if
>> one
>> >> >> could extract those information.
>> >> >>
>> >> >> * fise:OccurrentAnnotation: is used to annotate a Perdurant in the
>> >> >> context of the Setting. Also fise:OccurrentAnnotation can link to
>> >> >> fise:TextAnnotaion (typically verbs in the text defining the
>> >> >> perdurant) as well as fise:EntityAnnotation suggesting well known
>> >> >> Events in a knowledge base (e.g. a Election in a country, or an
>> >> >> upraising ...). In addition fise:OccurrentAnnotation can define
>> >> >> dc:has-participant links to fise:ParticipantAnnotation. In this case
>> >> >> it is explicitly stated hat an Endurant (the
>> >> >> fise:ParticipantAnnotation) involved in this Perturant (the
>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal indexed this
>> >> >> annotation should also support properties for defining the
>> >> >> xsd:dateTime for the start/end.
>> >> >>
>> >> >>
>> >> >> Indeed, an event based data structure makes a lot of sense with the
>> >> remark
>> >> > that you probably won't be able to always extract the date for a
>> given
>> >> > setting(situation).
>> >> > There are 2 thing which are unclear though.
>> >> >
>> >> > 1. Perdurant : You could have situations in which the object upon
>> which
>> >> the
>> >> > Subject ( or Endurant ) is acting is not a transitory object ( such
>> as an
>> >> > event, activity ) but rather another Endurant. For example we can
>> have
>> >> the
>> >> > phrase "USA invades Irak" where "USA" is the Endurant ( Subject )
>> which
>> >> > performs the action of "invading" on another Eundurant, namely
>> "Irak".
>> >> >
>> >>
>> >> By using CAOS, USA would be the Agent and Iraq the Patient. Both are
>> >> Endurants. The activity "invading" would be the Perdurant. So ideally
>> >> you would have a  "fise:SettingAnnotation" with:
>> >>
>> >>   * fise:ParticipantAnnotation for USA with the dc:type caos:Agent,
>> >> linking to a fise:TextAnnotation for "USA" and a fise:EntityAnnotation
>> >> linking to dbpedia:United_States
>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type caos:Patient,
>> >> linking to a fise:TextAnnotation for "Irak" and a
>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>> >>   * fise:OccurrentAnnotation for "invades" with the dc:type
>> >> caos:Activity, linking to a fise:TextAnnotation for "invades"
>> >>
>> >> > 2. Where does the verb, which links the Subject and the Object come
>> into
>> >> > this? I imagined that the Endurant would have a dc:"property" where
>> the
>> >> > property = verb which links to the Object in noun form. For example
>> take
>> >> > again the sentence "USA invades Irak". You would have the "USA"
>> Entity
>> >> with
>> >> > dc:invader which points to the Object "Irak". The Endurant would
>> have as
>> >> > many dc:"property" elements as there are verbs which link it to an
>> >> Object.
>> >>
>> >> As explained above you would have a fise:OccurrentAnnotation that
>> >> represents the Perdurant. The information that the activity mention in
>> >> the text is "invades" would be by linking to a fise:TextAnnotation. If
>> >> you can also provide an Ontology for Tasks that defines
>> >> "myTasks:invade" the fise:OccurrentAnnotation could also link to an
>> >> fise:EntityAnnotation for this concept.
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> >
>> >> > ### Consuming the data:
>> >> >>
>> >> >> I think this model should be sufficient for use-cases as described
>> by
>> >> you.
>> >> >>
>> >> >> Users would be able to consume data on the setting level. This can
>> be
>> >> >> done my simple retrieving all fise:ParticipantAnnotation as well as
>> >> >> fise:OccurrentAnnotation linked with a setting. BTW this was the
>> >> >> approach used in LIVE [2] for semantic search. It allows queries for
>> >> >> Settings that involve specific Entities e.g. you could filter for
>> >> >> Settings that involve a {Person}, activities:Arrested and a specific
>> >> >> {Upraising}. However note that with this approach you will get
>> results
>> >> >> for Setting where the {Person} participated and an other person was
>> >> >> arrested.
>> >> >>
>> >> >> An other possibility would be to process enhancement results on the
>> >> >> fise:OccurrentAnnotation. This would allow to a much higher
>> >> >> granularity level (e.g. it would allow to correctly answer the query
>> >> >> used as an example above). But I am wondering if the quality of the
>> >> >> Setting extraction will be sufficient for this. I have also doubts
>> if
>> >> >> this can be still realized by using semantic indexing to Apache Solr
>> >> >> or if it would be better/necessary to store results in a TripleStore
>> >> >> and using SPARQL for retrieval.
>> >> >>
>> >> >> The methodology and query language used by YAGO [3] is also very
>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>> Representation).
>> >> >>
>> >> >> An other related Topic is the enrichment of Entities (especially
>> >> >> Events) in knowledge bases based on Settings extracted form
>> Documents.
>> >> >> As per definition - in DOLCE - Perdurants are temporal indexed. That
>> >> >> means that at the time when added to a knowledge base they might
>> still
>> >> >> be in process. So the creation, enriching and refinement of such
>> >> >> Entities in a the knowledge base seams to be critical for a System
>> >> >> like described in your use-case.
>> >> >>
>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>> >> >> <[email protected]> wrote:
>> >> >> >
>> >> >> > First of all I have to mention that I am new in the field of
>> semantic
>> >> >> > technologies, I've started to read about them in the last 4-5
>> >> >> months.Having
>> >> >> > said that I have a high level overview of what is a good approach
>> to
>> >> >> solve
>> >> >> > this problem. There are a number of papers on the internet which
>> >> describe
>> >> >> > what steps need to be taken such as : named entity recognition,
>> >> >> > co-reference resolution, pos tagging and others.
>> >> >>
>> >> >> The Stanbol NLP processing module currently only supports sentence
>> >> >> detection, tokenization, POS tagging, Chunking, NER and lemma.
>> support
>> >> >> for co-reference resolution and dependency trees is currently
>> missing.
>> >> >>
>> >> >> Stanford NLP is already integrated with Stanbol [4]. At the moment
>> it
>> >> >> only supports English, but I do already work to include the other
>> >> >> supported languages. Other NLP framework that is already integrated
>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But note that for
>> all
>> >> >> those the integration excludes support for co-reference and
>> dependency
>> >> >> trees.
>> >> >>
>> >> >> Anyways I am confident that one can implement a first prototype by
>> >> >> only using Sentences and POS tags and - if available - Chunks (e.g.
>> >> >> Noun phrases).
>> >> >>
>> >> >>
>> >> > I assume that in the Stanbol context, a feature like Relation
>> extraction
>> >> > would be implemented as an EnhancementEngine?
>> >> > What kind of effort would be required for a co-reference resolution
>> tool
>> >> > integration into Stanbol?
>> >> >
>> >>
>> >> Yes in the end it would be an EnhancementEngine. But before we can
>> >> build such an engine we would need to
>> >>
>> >> * extend the Stanbol NLP processing API with Annotations for
>> co-reference
>> >> * add support for JSON Serialisation/Parsing for those annotation so
>> >> that the RESTful NLP Analysis Service can provide co-reference
>> >> information
>> >>
>> >> > At this moment I'll be focusing on 2 aspects:
>> >> >
>> >> > 1. Determine the best data structure to encapsulate the extracted
>> >> > information. I'll take a closer look at Dolce.
>> >>
>> >> Don't make to to complex. Defining a proper structure to represent
>> >> Events will only pay-off if we can also successfully extract such
>> >> information form processed texts.
>> >>
>> >> I would start with
>> >>
>> >>  * fise:SettingAnnotation
>> >>     * {fise:Enhancement} metadata
>> >>
>> >>  * fise:ParticipantAnnotation
>> >>     * {fise:Enhancement} metadata
>> >>     * fise:inSetting {settingAnnotation}
>> >>     * fise:hasMention {textAnnotation}
>> >>     * fise:suggestion {entityAnnotation} (multiple if there are more
>> >> suggestions)
>> >>     * dc:type one of fise:Agent, fise:Patient, fise:Instrument,
>> fise:Cause
>> >>
>> >>  * fise:OccurrentAnnotation
>> >>     * {fise:Enhancement} metadata
>> >>     * fise:inSetting {settingAnnotation}
>> >>     * fise:hasMention {textAnnotation}
>> >>     * dc:type set to fise:Activity
>> >>
>> >> If it turns out that we can extract more, we can add more structure to
>> >> those annotations. We might also think about using an own namespace
>> >> for those extensions to the annotation structure.
>> >>
>> >> > 2. Determine how should all of this be integrated into Stanbol.
>> >>
>> >> Just create an EventExtractionEngine and configure a enhancement chain
>> >> that does NLP processing and EntityLinking.
>> >>
>> >> You should have a look at
>> >>
>> >> * SentimentSummarizationEngine [1] as it does a lot of things with NLP
>> >> processing results (e.g. connecting adjectives (via verbs) to
>> >> nouns/pronouns. So as long we can not use explicit dependency trees
>> >> you code will need to do similar things with Nouns, Pronouns and
>> >> Verbs.
>> >>
>> >> * Disambigutation-MLT engine, as it creates a Java representation of
>> >> present fise:TextAnnotation and fise:EntityAnnotation [2]. Something
>> >> similar will also be required by the EventExtractionEngine for fast
>> >> access to such annotations while iterating over the Sentences of the
>> >> text.
>> >>
>> >>
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>> >> [2]
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>> >>
>> >> >
>> >> > Thanks
>> >> >
>> >> > Hope this helps to bootstrap this discussion
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             [email protected]
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             [email protected]
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Relation extraction feature

Reply via email to