Re: Relation extraction feature

Cristian Petroaca Sun, 15 Sep 2013 05:33:37 -0700

I've already started to implement the Coreference bit first in the nlp and
nlp-json projects. There's one thing that I don't know how to implement.
The CorefTag class contains a Set<Span> mentions member (represents the
"mentions" array defined in an earlier mail) and in the
CorefTagSupport.parse() method I need to reconstuct the CorefTag object
from json. I can't figure out how can I construct the aforementioned member
which should contain the references to mentions whch are Span objects found
in the AnalyzedTextImpl. One problem is I don't have access to the
AnalyzedTextImpl object and even if I did there could be situations in
which I am constructing a CorefTag for a Span which contains mentions to
other Spans which have not been parsed yet and they don't exist in the
AnalyzedTextImpl.


One solution would be not to link with the actual Span references from the
AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl).
That would need the ChunkImpl and TokenImpl constructors to be changed from
protected to public.


2013/9/12 Rupert Westenthaler <[email protected]>

> Hi Cristian,
>
> In fact I missed it. Sorry for that.
>
> I think the revised proposal looks like a good start. Usually one
> needs make some adaptions when writing the actual code.
>
> If you have a first version attach it to an issue and I will commit it
> to the branch.
>
> best
> Rupert
>
>
> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
> <[email protected]> wrote:
> > Hi Rupert,
> >
> > This is a reminder in case you missed this e-mail.
> >
> > Cristian
> >
> >
> > 2013/9/3 Cristian Petroaca <[email protected]>
> >
> >> Ok, then to sum it up we would have :
> >>
> >> 1. Coref
> >>
> >> "stanbol.enhancer.nlp.coref" {
> >>     "isRepresentative" : true/false, // whether this token or chunk is
> the
> >> representative mention in the chain
> >>     "mentions" : [ { "type" : "Token", // type of element which refers
> to
> >> this token/chunk
> >>  "start": 123 , // start index of the mentioning element
> >>  "end": 130 // end index of the mentioning element
> >>                     }, ...
> >>                  ],
> >>     "class" : ""class" :
> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> }
> >>
> >>
> >> 2. Dependency tree
> >>
> >> "stanbol.enhancer.nlp.dependency" : {
> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
> >> notation
> >>                        "dep" : 12, // type of relation - Stanbol NLP
> >> mapped value - ordinal number in enum Dependency
> >> "role" : "gov/dep", // whether this token is the depender or the
> dependee
> >>  "type" : "Token", // type of element with which this token is in
> relation
> >> "start" : 123, // start index of the relating token
> >>  "end" : 130 // end index of the relating token
> >> },
> >> ...
> >>  ]
> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> }
> >>
> >>
> >> 2013/9/2 Rupert Westenthaler <[email protected]>
> >>
> >>> Hi Cristian,
> >>>
> >>> let me provide some feedback to your proposals:
> >>>
> >>> ### Referring other Spans
> >>>
> >>> Both suggested annotations require to link other spans (Sentence,
> >>> Chunk or Token). For that we should introduce a JSON element used for
> >>> referring those elements and use it for all usages.
> >>>
> >>> In the java model this would allow you to have a reference to the
> >>> other Span (Sentence, Chunk, Token). In the serialized form you would
> >>> have JSON elements with the "type", "start" and "end" attributes as
> >>> those three uniquely identify any span.
> >>>
> >>> Here an example based on the "mention" attribute as defined by the
> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >>>
> >>>     ...
> >>>     "mentions" : [ {
> >>>         "type" : "Token",
> >>>         "start": 123 ,
> >>>         "end": 130 } ,{
> >>>         "type" : "Token",
> >>>         "start": 157 ,
> >>>         "end": 165 }],
> >>>     ...
> >>>
> >>> Similar token links in
> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
> >>> use this model.
> >>>
> >>> ### Usage of Controlled Vocabularies
> >>>
> >>> In addition the DependencyTag also seams to use a controlled
> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
> >>> NLP module tries to define those in some kind of Ontology. For POS
> >>> tags we use OLIA ontology [1]. This is important as most NLP
> >>> frameworks will use different strings and we need to unify those to
> >>> commons IDs so that component that consume those data do not depend on
> >>> a specific NLP tool.
> >>>
> >>> Because the usage of Ontologies within Java is not well supported. The
> >>> Stanbol NLP module defines Java Enumerations for those Ontologies such
> >>> as the POS type enumeration [2].
> >>>
> >>> Both the Java Model as well as the JSON serialization do support both
> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped
> >>> concept. In the Java API via two different methods and in the JSON
> >>> serialization via two separate keys.
> >>>
> >>> To make this more clear here an example for a POS annotation of a
> proper
> >>> noun.
> >>>
> >>>     "stanbol.enhancer.nlp.pos" : {
> >>>         "tag" : "PN",
> >>>         "pos" : 53,
> >>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
> >>>         "prob" : 0.95
> >>>     }
> >>>
> >>> where
> >>>
> >>>     "tag" : "PN"
> >>>
> >>> is the lexical form as used by the NLP tool and
> >>>
> >>>     "pos" : 53
> >>>
> >>> refers to the ordinal number of the entry "ProperNoun" in the POS
> >>> enumeration
> >>>
> >>> IMO the "type" property of DependencyTag should use a similar design.
> >>>
> >>> best
> >>> Rupert
> >>>
> >>> [1] http://olia.nlp2rdf.org/
> >>> [2]
> >>>
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
> >>>
> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
> >>> <[email protected]> wrote:
> >>> > Sorry, pressed sent too soon :).
> >>> >
> >>> > Continued :
> >>> >
> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
> >>> >
> >>> > Given this, we can have for each "Token" an additional dependency
> >>> > annotation :
> >>> >
> >>> > "stanbol.enhancer.nlp.dependency" : {
> >>> > "tag" : //is it necessary?
> >>> > "relations" : [ { "type" : "nsubj", //type of relation
> >>> >   "role" : "gov/dep", //whether it is depender or the dependee
> >>> >   "dependencyValue" : "met", // the word with which the token has a
> >>> relation
> >>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in
> the
> >>> > current sentence
> >>> > }
> >>> > ...
> >>> > ]
> >>> >                 "class" :
> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >>> >         }
> >>> >
> >>> > 2013/9/1 Cristian Petroaca <[email protected]>
> >>> >
> >>> >> Related to the Stanford Dependency Tree Feature, this is the way the
> >>> >> output from the tool looks like for this sentence : "Mary and Tom
> met
> >>> Danny
> >>> >> today" :
> >>> >>
> >>> >>
> >>> >> 2013/8/30 Cristian Petroaca <[email protected]>
> >>> >>
> >>> >>> Hi Rupert,
> >>> >>>
> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP
> Server
> >>> and
> >>> >>> the coref module I'm thinking I can represent the coreference
> >>> information
> >>> >>> this way:
> >>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
> >>> with
> >>> >>> the following structure :
> >>> >>>
> >>> >>> "stanbol.enhancer.nlp.coref" {
> >>> >>>     "tag" : //does this need to exist?
> >>> >>>     "isRepresentative" : true/false, // whether this token or
> chunk is
> >>> >>> the representative mention in the chain
> >>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
> >>> mention
> >>> >>> is found
> >>> >>>                            "startWord" : 2 //the first word making
> up
> >>> the
> >>> >>> mention
> >>> >>>                            "endWord" : 3 //the last word making up
> the
> >>> >>> mention
> >>> >>>                          }, ...
> >>> >>>                        ],
> >>> >>>     "class" : ""class" :
> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >>> >>> }
> >>> >>>
> >>> >>> The CorefTag should resemble this model.
> >>> >>>
> >>> >>> What do you think?
> >>> >>>
> >>> >>> Cristian
> >>> >>>
> >>> >>>
> >>> >>> 2013/8/24 Rupert Westenthaler <[email protected]>
> >>> >>>
> >>> >>>> Hi Cristian,
> >>> >>>>
> >>> >>>> you can not directly call StanfordNLP components from Stanbol, but
> >>> you
> >>> >>>> have to extend the RESTful service to include the information you
> >>> >>>> need. The main reason for that is that the license of StanfordNLP
> is
> >>> >>>> not compatible with the Apache Software License. So Stanbol can
> not
> >>> >>>> directly link to the StanfordNLP API.
> >>> >>>>
> >>> >>>> You will need to
> >>> >>>>
> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}>
> class
> >>> >>>> in the o.a.s.enhancer.nlp module
> >>> >>>> 2. add JSON parsing and serialization support for this tag to the
> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an
> example)
> >>> >>>>
> >>> >>>> As (1) would be necessary anyway the only additional thing you
> need
> >>> to
> >>> >>>> develop is (2). After that you can add {yourTag} instance to the
> >>> >>>> AnalyzedText in the StanfornNLP integration. The
> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have
> access
> >>> >>>> to your annotations.
> >>> >>>>
> >>> >>>> If you have a design for {yourTag} - the model you would like to
> use
> >>> >>>> to represent your data - I can help with (1) and (2).
> >>> >>>>
> >>> >>>> best
> >>> >>>> Rupert
> >>> >>>>
> >>> >>>>
> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> >>> >>>> <[email protected]> wrote:
> >>> >>>> > Hi Rupert,
> >>> >>>> >
> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp
> project I
> >>> see
> >>> >>>> that
> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
> >>> rather
> >>> >>>> it
> >>> >>>> > is used directly in a Jetty Server instance. How does that fit
> >>> into the
> >>> >>>> > Stanbol stack? For example how can I call the
> StanfordNlpAnalyzer's
> >>> >>>> routine
> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
> >>> Stanbol
> >>> >>>> stack?
> >>> >>>> >
> >>> >>>> > Thanks,
> >>> >>>> > Cristian
> >>> >>>> >
> >>> >>>> >
> >>> >>>> > 2013/8/12 Rupert Westenthaler <[email protected]>
> >>> >>>> >
> >>> >>>> >> Hi Cristian,
> >>> >>>> >>
> >>> >>>> >> Sorry for the late response, but I was offline for the last two
> >>> weeks
> >>> >>>> >>
> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >>> >>>> >> <[email protected]> wrote:
> >>> >>>> >> > Hi Rupert,
> >>> >>>> >> >
> >>> >>>> >> > After doing some tests it seems that the Stanford NLP
> >>> coreference
> >>> >>>> module
> >>> >>>> >> is
> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to
> extend
> >>> >>>> Stanford
> >>> >>>> >> > NLP to add coreference there.
> >>> >>>> >>
> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol
> codebase
> >>> >>>> >> because the licenses are not compatible.
> >>> >>>> >>
> >>> >>>> >> You can find the Stanford NLP integration on
> >>> >>>> >>
> >>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
> >>> >>>> >>
> >>> >>>> >> just create a fork and send pull requests.
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> > Could you add the necessary projects on the branch? And also
> >>> remove
> >>> >>>> the
> >>> >>>> >> > Open NLP ones?
> >>> >>>> >> >
> >>> >>>> >>
> >>> >>>> >> Currently the branch
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>> >>>> >>
> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
> >>> should
> >>> >>>> >> be enough for adding coreference support.
> >>> >>>> >>
> >>> >>>> >> IMO you will need to
> >>> >>>> >>
> >>> >>>> >> * add an model for representing coreference to the nlp module
> >>> >>>> >> * add parsing and serializing support to the nlp-json module
> >>> >>>> >> * add the implementation to your fork of the
> stanbol-stanfordnlp
> >>> >>>> project
> >>> >>>> >>
> >>> >>>> >> best
> >>> >>>> >> Rupert
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> > Thanks,
> >>> >>>> >> > Cristian
> >>> >>>> >> >
> >>> >>>> >> >
> >>> >>>> >> > 2013/7/5 Rupert Westenthaler <[email protected]>
> >>> >>>> >> >
> >>> >>>> >> >> Hi Cristian,
> >>> >>>> >> >>
> >>> >>>> >> >> I created the branch at
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >>> >>>> >> >>
> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
> >>> know
> >>> >>>> if
> >>> >>>> >> >> you would like to have more
> >>> >>>> >> >>
> >>> >>>> >> >> best
> >>> >>>> >> >> Rupert
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >>> >>>> >> >> <[email protected]> wrote:
> >>> >>>> >> >> > Hi Rupert,
> >>> >>>> >> >> >
> >>> >>>> >> >> > I created jiras :
> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
> >>> >>>> original one
> >>> >>>> >> in
> >>> >>>> >> >> > dependent upon these.
> >>> >>>> >> >> > Please let me know when I can start using the branch.
> >>> >>>> >> >> >
> >>> >>>> >> >> > Thanks,
> >>> >>>> >> >> > Cristian
> >>> >>>> >> >> >
> >>> >>>> >> >> >
> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca <[email protected]>
> >>> >>>> >> >> >
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
> >>> [email protected]>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >>> >>>> >> >> >>> <[email protected]> wrote:
> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
> >>> >>>> previous
> >>> >>>> >> >> e-mail.
> >>> >>>> >> >> >>> By
> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
> >>> dependency
> >>> >>>> trees?
> >>> >>>> >> >> >>> >
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
> >>> >>>> Stanbol,
> >>> >>>> >> I'll
> >>> >>>> >> >> >> take a look at how I can extend its integration to
> include
> >>> the
> >>> >>>> >> >> dependency
> >>> >>>> >> >> >> tree feature.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>  >
> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <
> [email protected]
> >>> >
> >>> >>>> >> >> >>> >
> >>> >>>> >> >> >>> >> Hi Rupert,
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> I created jira
> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >>> >>>> >> >> >>> >> As you suggested I would start with extending the
> >>> Stanford
> >>> >>>> NLP
> >>> >>>> >> with
> >>> >>>> >> >> >>> >> co-reference resolution but I think also with
> dependency
> >>> >>>> trees
> >>> >>>> >> >> because
> >>> >>>> >> >> >>> I
> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
> >>> object
> >>> >>>> >> that it
> >>> >>>> >> >> >>> >> affects, right?
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
> >>> Stanbol
> >>> >>>> for
> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed
> with
> >>> >>>> this?
> >>> >>>> >> Do I
> >>> >>>> >> >> >>> create
> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After
> that
> >>> can I
> >>> >>>> >> start
> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
> >>> done
> >>> >>>> I'll
> >>> >>>> >> send
> >>> >>>> >> >> >>> you
> >>> >>>> >> >> >>> >> guys the patch fo review?
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for
> adding
> >>> >>>> support
> >>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
> >>> >>>> support. You
> >>> >>>> >> >> >>> should also define "depends on" relations between
> >>> STANBOL-1121
> >>> >>>> and
> >>> >>>> >> >> >>> those two new issues.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> Sub-task could also work, but as adding those features
> >>> would
> >>> >>>> be also
> >>> >>>> >> >> >>> interesting for other things I would rather define them
> as
> >>> >>>> separate
> >>> >>>> >> >> >>> issues.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >> 2 New Features connected with the original jira it is
> then.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
> >>> me.
> >>> >>>> This
> >>> >>>> >> >> >>> could have the advantage that patches would not be
> >>> affected by
> >>> >>>> >> changes
> >>> >>>> >> >> >>> in the trunk.
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> Yes, a separate branch sounds good.
> >>> >>>> >> >> >>
> >>> >>>> >> >> >> best
> >>> >>>> >> >> >>> Rupert
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> >> Regards,
> >>> >>>> >> >> >>> >> Cristian
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
> >>> >>>> [email protected]>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
> >>> >>>> >> >> >>> >>> <[email protected]> wrote:
> >>> >>>> >> >> >>> >>> > Hi Rupert,
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Agreed on the
> >>> >>>> >> >> >>>
> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >>> >>>> >> >> >>> >>> > data structure.
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order
> to
> >>> >>>> >> encapsulate
> >>> >>>> >> >> this
> >>> >>>> >> >> >>> >>> > information and establish the goals and these
> initial
> >>> >>>> steps
> >>> >>>> >> >> towards
> >>> >>>> >> >> >>> >>> these
> >>> >>>> >> >> >>> >>> > goals?
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be
> great.
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
> >>> design
> >>> >>>> >> >> documents
> >>> >>>> >> >> >>> that
> >>> >>>> >> >> >>> >>> > need to be reviewed?
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> Usually it is the best to write design related text
> >>> >>>> directly in
> >>> >>>> >> >> JIRA
> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us
> later
> >>> to
> >>> >>>> use
> >>> >>>> >> this
> >>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
> >>> Webpage.
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> best
> >>> >>>> >> >> >>> >>> Rupert
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > Regards,
> >>> >>>> >> >> >>> >>> > Cristian
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> >>> >>>> [email protected]>
> >>> >>>> >> >> >>> >>> >
> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian
> Petroaca
> >>> >>>> >> >> >>> >>> >> <[email protected]> wrote:
> >>> >>>> >> >> >>> >>> >> > HI Rupert,
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed
> suggestions.
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >>> >>>> >> [email protected]>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> really interesting use case!
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some
> suggestions
> >>> on
> >>> >>>> how
> >>> >>>> >> this
> >>> >>>> >> >> >>> could
> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
> >>> >>>> experiences
> >>> >>>> >> >> and
> >>> >>>> >> >> >>> >>> lessons
> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we
> built an
> >>> >>>> >> information
> >>> >>>> >> >> >>> system
> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
> >>> Project
> >>> >>>> >> excluded
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
> >>> (because
> >>> >>>> the
> >>> >>>> >> >> Olympic
> >>> >>>> >> >> >>> >>> >> >> Information System was already providing event
> >>> data
> >>> >>>> as XML
> >>> >>>> >> >> >>> messages)
> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this
> system
> >>> >>>> where very
> >>> >>>> >> >> >>> similar
> >>> >>>> >> >> >>> >>> as
> >>> >>>> >> >> >>> >>> >> >> the one described by your use case.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
> >>> relations,
> >>> >>>> but a
> >>> >>>> >> >> formal
> >>> >>>> >> >> >>> >>> >> >> representation of the situation described by
> the
> >>> >>>> text. So
> >>> >>>> >> >> lets
> >>> >>>> >> >> >>> >>> assume
> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
> >>> Situation)
> >>> >>>> >> >> described
> >>> >>>> >> >> >>> in
> >>> >>>> >> >> >>> >>> the
> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
> >>> >>>> advices on
> >>> >>>> >> >> how to
> >>> >>>> >> >> >>> >>> model
> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling
> this
> >>> >>>> >> >> Participation:
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> where ..
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants
> do
> >>> have
> >>> >>>> an
> >>> >>>> >> >> >>> identity so
> >>> >>>> >> >> >>> >>> we
> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
> >>> referenced
> >>> >>>> by a
> >>> >>>> >> >> >>> setting.
> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical,
> non-physical as
> >>> >>>> well as
> >>> >>>> >> >> >>> >>> >> >> social-objects.
> >>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
> >>> are
> >>> >>>> >> entities
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
> >>> Activities ...
> >>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
> >>> >>>> relation
> >>> >>>> >> where
> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
> >>> >>>> intermediate
> >>> >>>> >> >> >>> resources
> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary
> relations.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy
> to
> >>> >>>> define
> >>> >>>> >> one
> >>> >>>> >> >> >>> resource
> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I
> would
> >>> >>>> call
> >>> >>>> >> this
> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> >>> >>>> sub-concept to
> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement
> about
> >>> the
> >>> >>>> >> extracted
> >>> >>>> >> >> >>> >>> Setting
> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to
> it.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
> >>> annotate
> >>> >>>> that
> >>> >>>> >> >> >>> Endurant is
> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >>> >>>> >> >> >>> fise:SettingAnnotation).
> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
> >>> >>>> >> >> fise:TextAnnotaion
> >>> >>>> >> >> >>> (the
> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
> >>> >>>> Entities).
> >>> >>>> >> >> >>> Basically
> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
> >>> >>>> >> >> EnhancementEngine
> >>> >>>> >> >> >>> to
> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
> >>> different
> >>> >>>> >> >> sentences) do
> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating
> in
> >>> the
> >>> >>>> >> Setting.
> >>> >>>> >> >> In
> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the
> dc:type
> >>> >>>> property
> >>> >>>> >> >> >>> (similar
> >>> >>>> >> >> >>> >>> as
> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the
> role(s)
> >>> of
> >>> >>>> an
> >>> >>>> >> >> >>> participant
> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs
> an
> >>> >>>> action)
> >>> >>>> >> Cause
> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
> >>> >>>> passive
> >>> >>>> >> role
> >>> >>>> >> >> in
> >>> >>>> >> >> >>> an
> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)),
> but
> >>> I am
> >>> >>>> >> >> wondering
> >>> >>>> >> >> >>> if
> >>> >>>> >> >> >>> >>> one
> >>> >>>> >> >> >>> >>> >> >> could extract those information.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to
> annotate a
> >>> >>>> >> Perdurant
> >>> >>>> >> >> in
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
> >>> >>>> fise:OccurrentAnnotation can
> >>> >>>> >> >> link
> >>> >>>> >> >> >>> to
> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the
> text
> >>> >>>> defining
> >>> >>>> >> the
> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
> >>> >>>> suggesting
> >>> >>>> >> well
> >>> >>>> >> >> >>> known
> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election
> in a
> >>> >>>> country,
> >>> >>>> >> or
> >>> >>>> >> >> an
> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
> >>> fise:OccurrentAnnotation
> >>> >>>> can
> >>> >>>> >> >> define
> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
> >>> >>>> fise:ParticipantAnnotation. In
> >>> >>>> >> >> this
> >>> >>>> >> >> >>> case
> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
> >>> >>>> Perturant
> >>> >>>> >> (the
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
> >>> >>>> temporal
> >>> >>>> >> >> indexed
> >>> >>>> >> >> >>> this
> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
> >>> >>>> defining the
> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
> >>> lot of
> >>> >>>> sense
> >>> >>>> >> >> with
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> remark
> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always
> extract
> >>> the
> >>> >>>> date
> >>> >>>> >> >> for a
> >>> >>>> >> >> >>> >>> given
> >>> >>>> >> >> >>> >>> >> > setting(situation).
> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in
> which
> >>> the
> >>> >>>> >> object
> >>> >>>> >> >> upon
> >>> >>>> >> >> >>> >>> which
> >>> >>>> >> >> >>> >>> >> the
> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
> >>> transitory
> >>> >>>> >> object (
> >>> >>>> >> >> >>> such
> >>> >>>> >> >> >>> >>> as an
> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant.
> For
> >>> >>>> example
> >>> >>>> >> we
> >>> >>>> >> >> can
> >>> >>>> >> >> >>> >>> have
> >>> >>>> >> >> >>> >>> >> the
> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
> >>> Endurant
> >>> >>>> (
> >>> >>>> >> >> Subject )
> >>> >>>> >> >> >>> >>> which
> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
> >>> >>>> Eundurant,
> >>> >>>> >> namely
> >>> >>>> >> >> >>> >>> "Irak".
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq
> the
> >>> >>>> Patient.
> >>> >>>> >> Both
> >>> >>>> >> >> >>> are
> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
> >>> >>>> Perdurant. So
> >>> >>>> >> >> >>> ideally
> >>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
> >>> dc:type
> >>> >>>> >> >> caos:Agent,
> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
> >>> >>>> >> >> >>> fise:EntityAnnotation
> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
> >>> dc:type
> >>> >>>> >> >> >>> caos:Patient,
> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with
> the
> >>> >>>> dc:type
> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation
> for
> >>> >>>> "invades"
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
> >>> and
> >>> >>>> the
> >>> >>>> >> Object
> >>> >>>> >> >> >>> come
> >>> >>>> >> >> >>> >>> into
> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
> >>> >>>> >> dc:"property"
> >>> >>>> >> >> >>> where
> >>> >>>> >> >> >>> >>> the
> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in
> noun
> >>> >>>> form. For
> >>> >>>> >> >> >>> example
> >>> >>>> >> >> >>> >>> take
> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You
> would
> >>> have
> >>> >>>> the
> >>> >>>> >> >> "USA"
> >>> >>>> >> >> >>> >>> Entity
> >>> >>>> >> >> >>> >>> >> with
> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak".
> The
> >>> >>>> Endurant
> >>> >>>> >> >> would
> >>> >>>> >> >> >>> >>> have as
> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
> >>> which
> >>> >>>> link
> >>> >>>> >> it
> >>> >>>> >> >> to
> >>> >>>> >> >> >>> an
> >>> >>>> >> >> >>> >>> >> Object.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> As explained above you would have a
> >>> >>>> fise:OccurrentAnnotation
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that
> the
> >>> >>>> activity
> >>> >>>> >> >> >>> mention in
> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >>> >>>> >> >> >>> fise:TextAnnotation. If
> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
> >>> defines
> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation
> could
> >>> >>>> also link
> >>> >>>> >> >> to an
> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> best
> >>> >>>> >> >> >>> >>> >> Rupert
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > ### Consuming the data:
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
> >>> >>>> use-cases as
> >>> >>>> >> >> >>> described
> >>> >>>> >> >> >>> >>> by
> >>> >>>> >> >> >>> >>> >> you.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
> >>> setting
> >>> >>>> level.
> >>> >>>> >> >> This
> >>> >>>> >> >> >>> can
> >>> >>>> >> >> >>> >>> be
> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
> >>> >>>> fise:ParticipantAnnotation
> >>> >>>> >> as
> >>> >>>> >> >> >>> well as
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a
> setting.
> >>> BTW
> >>> >>>> this
> >>> >>>> >> was
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic
> search. It
> >>> >>>> allows
> >>> >>>> >> >> >>> queries for
> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g.
> you
> >>> >>>> could
> >>> >>>> >> filter
> >>> >>>> >> >> >>> for
> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
> >>> >>>> activities:Arrested and
> >>> >>>> >> a
> >>> >>>> >> >> >>> specific
> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this
> approach
> >>> >>>> you will
> >>> >>>> >> >> get
> >>> >>>> >> >> >>> >>> results
> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated
> and
> >>> an
> >>> >>>> other
> >>> >>>> >> >> person
> >>> >>>> >> >> >>> was
> >>> >>>> >> >> >>> >>> >> >> arrested.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
> >>> enhancement
> >>> >>>> >> results
> >>> >>>> >> >> on
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to
> a
> >>> much
> >>> >>>> >> higher
> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
> >>> correctly
> >>> >>>> answer
> >>> >>>> >> >> the
> >>> >>>> >> >> >>> query
> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering
> if
> >>> the
> >>> >>>> >> quality
> >>> >>>> >> >> of
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for
> this. I
> >>> >>>> have
> >>> >>>> >> also
> >>> >>>> >> >> >>> doubts
> >>> >>>> >> >> >>> >>> if
> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
> >>> >>>> indexing to
> >>> >>>> >> >> Apache
> >>> >>>> >> >> >>> Solr
> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
> >>> results
> >>> >>>> in a
> >>> >>>> >> >> >>> TripleStore
> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by
> YAGO
> >>> [3]
> >>> >>>> is
> >>> >>>> >> also
> >>> >>>> >> >> very
> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
> >>> SPOTL(X)
> >>> >>>> >> >> >>> >>> Representation).
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
> >>> Entities
> >>> >>>> >> >> (especially
> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
> >>> >>>> extracted
> >>> >>>> >> form
> >>> >>>> >> >> >>> >>> Documents.
> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
> >>> >>>> temporal
> >>> >>>> >> >> indexed.
> >>> >>>> >> >> >>> That
> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a
> knowledge
> >>> >>>> base they
> >>> >>>> >> >> might
> >>> >>>> >> >> >>> >>> still
> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
> >>> >>>> refinement
> >>> >>>> >> of
> >>> >>>> >> >> such
> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
> >>> >>>> critical for
> >>> >>>> >> a
> >>> >>>> >> >> >>> System
> >>> >>>> >> >> >>> >>> >> >> like described in your use-case.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
> >>> Petroaca
> >>> >>>> >> >> >>> >>> >> >> <[email protected]> wrote:
> >>> >>>> >> >> >>> >>> >> >> >
> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
> >>> in the
> >>> >>>> >> field
> >>> >>>> >> >> of
> >>> >>>> >> >> >>> >>> semantic
> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about
> them
> >>> in
> >>> >>>> the
> >>> >>>> >> last
> >>> >>>> >> >> 4-5
> >>> >>>> >> >> >>> >>> >> >> months.Having
> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of
> what
> >>> is
> >>> >>>> a good
> >>> >>>> >> >> >>> approach
> >>> >>>> >> >> >>> >>> to
> >>> >>>> >> >> >>> >>> >> >> solve
> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers
> on
> >>> the
> >>> >>>> >> internet
> >>> >>>> >> >> >>> which
> >>> >>>> >> >> >>> >>> >> describe
> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
> >>> entity
> >>> >>>> >> >> >>> recognition,
> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
> >>> others.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently
> only
> >>> >>>> supports
> >>> >>>> >> >> >>> sentence
> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging,
> Chunking,
> >>> NER
> >>> >>>> and
> >>> >>>> >> >> lemma.
> >>> >>>> >> >> >>> >>> support
> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency
> trees
> >>> is
> >>> >>>> >> currently
> >>> >>>> >> >> >>> >>> missing.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with
> Stanbol
> >>> [4].
> >>> >>>> At
> >>> >>>> >> the
> >>> >>>> >> >> >>> moment
> >>> >>>> >> >> >>> >>> it
> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work
> to
> >>> >>>> include
> >>> >>>> >> the
> >>> >>>> >> >> >>> other
> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that
> is
> >>> >>>> already
> >>> >>>> >> >> >>> integrated
> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane
> [6].
> >>> But
> >>> >>>> note
> >>> >>>> >> >> that
> >>> >>>> >> >> >>> for
> >>> >>>> >> >> >>> >>> all
> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
> >>> >>>> co-reference
> >>> >>>> >> and
> >>> >>>> >> >> >>> >>> dependency
> >>> >>>> >> >> >>> >>> >> >> trees.
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement
> a
> >>> first
> >>> >>>> >> >> prototype
> >>> >>>> >> >> >>> by
> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
> >>> available
> >>> >>>> -
> >>> >>>> >> Chunks
> >>> >>>> >> >> >>> (e.g.
> >>> >>>> >> >> >>> >>> >> >> Noun phrases).
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
> >>> like
> >>> >>>> >> Relation
> >>> >>>> >> >> >>> >>> extraction
> >>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
> >>> >>>> co-reference
> >>> >>>> >> >> >>> resolution
> >>> >>>> >> >> >>> >>> tool
> >>> >>>> >> >> >>> >>> >> > integration into Stanbol?
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine.
> But
> >>> >>>> before
> >>> >>>> >> we
> >>> >>>> >> >> can
> >>> >>>> >> >> >>> >>> >> build such an engine we would need to
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
> >>> >>>> Annotations for
> >>> >>>> >> >> >>> >>> co-reference
> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
> >>> those
> >>> >>>> >> >> annotation
> >>> >>>> >> >> >>> so
> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
> >>> >>>> >> co-reference
> >>> >>>> >> >> >>> >>> >> information
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
> >>> encapsulate
> >>> >>>> the
> >>> >>>> >> >> extracted
> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
> >>> structure to
> >>> >>>> >> >> represent
> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also
> successfully
> >>> >>>> extract
> >>> >>>> >> >> such
> >>> >>>> >> >> >>> >>> >> information form processed texts.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> I would start with
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation}
> (multiple
> >>> if
> >>> >>>> there
> >>> >>>> >> are
> >>> >>>> >> >> >>> more
> >>> >>>> >> >> >>> >>> >> suggestions)
> >>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >>> >>>> >> fise:Instrument,
> >>> >>>> >> >> >>> >>> fise:Cause
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can
> add
> >>> >>>> more
> >>> >>>> >> >> >>> structure to
> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about
> using
> >>> an
> >>> >>>> own
> >>> >>>> >> >> namespace
> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be
> integrated
> >>> into
> >>> >>>> >> >> Stanbol.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and
> configure a
> >>> >>>> >> enhancement
> >>> >>>> >> >> >>> chain
> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> You should have a look at
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a
> lot
> >>> of
> >>> >>>> things
> >>> >>>> >> >> with
> >>> >>>> >> >> >>> NLP
> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives
> (via
> >>> >>>> verbs) to
> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use
> explicit
> >>> >>>> dependency
> >>> >>>> >> >> trees
> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with
> Nouns,
> >>> >>>> Pronouns
> >>> >>>> >> and
> >>> >>>> >> >> >>> >>> >> Verbs.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a
> Java
> >>> >>>> >> >> representation
> >>> >>>> >> >> >>> of
> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
> >>> fise:EntityAnnotation
> >>> >>>> [2].
> >>> >>>> >> >> >>> Something
> >>> >>>> >> >> >>> >>> >> similar will also be required by the
> >>> >>>> EventExtractionEngine
> >>> >>>> >> for
> >>> >>>> >> >> fast
> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating over
> the
> >>> >>>> >> Sentences of
> >>> >>>> >> >> >>> the
> >>> >>>> >> >> >>> >>> >> text.
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> best
> >>> >>>> >> >> >>> >>> >> Rupert
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> [1]
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >>> >>>> >> >> >>> >>> >> [2]
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>>
> >>>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > Thanks
> >>> >>>> >> >> >>> >>> >> >
> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >>> >>>> >> >> >>> >>> >> >> best
> >>> >>>> >> >> >>> >>> >> >> Rupert
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >> >> --
> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
> >>> >>>> >> >> [email protected]
> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >>> >>>> >> >> >>> ++43-699-11108907
> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>> >> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>> >> --
> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
> >>> >>>> >> [email protected]
> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
> >>> >>>> >> >> >>> ++43-699-11108907
> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>> >>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>> --
> >>> >>>> >> >> >>> >>> | Rupert Westenthaler
> >>> >>>> [email protected]
> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
> >>> >>>> >> >> ++43-699-11108907
> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
> >>> >>>> >> >> >>> >>>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>> >>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>> --
> >>> >>>> >> >> >>> | Rupert Westenthaler
> >>> >>>> [email protected]
> >>> >>>> >> >> >>> | Bodenlehenstraße 11
> >>> >>>> ++43-699-11108907
> >>> >>>> >> >> >>> | A-5500 Bischofshofen
> >>> >>>> >> >> >>>
> >>> >>>> >> >> >>
> >>> >>>> >> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >>
> >>> >>>> >> >> --
> >>> >>>> >> >> | Rupert Westenthaler
> >>> [email protected]
> >>> >>>> >> >> | Bodenlehenstraße 11
> >>> >>>> ++43-699-11108907
> >>> >>>> >> >> | A-5500 Bischofshofen
> >>> >>>> >> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >>
> >>> >>>> >> --
> >>> >>>> >> | Rupert Westenthaler
> [email protected]
> >>> >>>> >> | Bodenlehenstraße 11
> >>> ++43-699-11108907
> >>> >>>> >> | A-5500 Bischofshofen
> >>> >>>> >>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> | Rupert Westenthaler             [email protected]
> >>> >>>> | Bodenlehenstraße 11
> ++43-699-11108907
> >>> >>>> | A-5500 Bischofshofen
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>
> >>>
> >>>
> >>>
> >>> --
> >>> | Rupert Westenthaler             [email protected]
> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >>> | A-5500 Bischofshofen
> >>>
> >>
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Reply via email to