Re: Relation extraction feature

Cristian Petroaca Mon, 16 Sep 2013 12:26:52 -0700

Ok. This means that I'll need to do a little refactoring on
the ValueTypeParser.parser() method to include a reference to the
AnalyzedText object coming from AnalyzedTextParser.



2013/9/16 Rupert Westenthaler <[email protected]>

> Hi Cristian
>
> If you have start/end and type of the referenced Span you can use the
> according
>
>     AnalysedText#add**
>
> e.g.
>
>     AnalysedText#addToken(start, end)
>     AnalysedText#addChunk(start, end)
>
> method and just use the returned instance. Those methods do all the
> magic. Meaning if the referenced Span does not yet exist (forward
> reference) it will create a new instance. If the Span already exists
> (backward reference) you will get the existing instance including all
> the other annotations already parsed from the JSON. In case of a
> forward reference the Span created by you (for forward references)
> other annotations will be added by the same way.
>
> This behavior is also the reason why the constructors of the TokenImpl
> and ChunkImpl (and all other **Impl) are not public.
>
> A similar code can be found in the
>
>     AnalyzedTextParser#parseSpan(AnalysedText at, JsonNode node)
>
> method (o.a.s.enhancer.nlp.json module)
>
>
> So if you have a reference to a Span in your Java API:
>
> (1) parse the start/end/type of the reference
> (2) call add**(start, end) on the AnalysedText
> (3) add the returned Span to your set with references
>
> If you want your references to be sorted you should use NavigableSet
> instead of Set.
>
> best
> Rupert
>
> On Sun, Sep 15, 2013 at 2:32 PM, Cristian Petroaca
> <[email protected]> wrote:
> > I've already started to implement the Coreference bit first in the nlp
> and
> > nlp-json projects. There's one thing that I don't know how to implement.
> > The CorefTag class contains a Set<Span> mentions member (represents the
> > "mentions" array defined in an earlier mail) and in the
> > CorefTagSupport.parse() method I need to reconstuct the CorefTag object
> > from json. I can't figure out how can I construct the aforementioned
> member
> > which should contain the references to mentions whch are Span objects
> found
> > in the AnalyzedTextImpl. One problem is I don't have access to the
> > AnalyzedTextImpl object and even if I did there could be situations in
> > which I am constructing a CorefTag for a Span which contains mentions to
> > other Spans which have not been parsed yet and they don't exist in the
> > AnalyzedTextImpl.
> >
> > One solution would be not to link with the actual Span references from
> the
> > AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl).
> > That would need the ChunkImpl and TokenImpl constructors to be changed
> from
> > protected to public.
> >
> >
> > 2013/9/12 Rupert Westenthaler <[email protected]>
> >
> >> Hi Cristian,
> >>
> >> In fact I missed it. Sorry for that.
> >>
> >> I think the revised proposal looks like a good start. Usually one
> >> needs make some adaptions when writing the actual code.
> >>
> >> If you have a first version attach it to an issue and I will commit it
> >> to the branch.
> >>
> >> best
> >> Rupert
> >>
> >>
> >> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
> >> <[email protected]> wrote:
> >> > Hi Rupert,
> >> >
> >> > This is a reminder in case you missed this e-mail.
> >> >
> >> > Cristian
> >> >
> >> >
> >> > 2013/9/3 Cristian Petroaca <[email protected]>
> >> >
> >> >> Ok, then to sum it up we would have :
> >> >>
> >> >> 1. Coref
> >> >>
> >> >> "stanbol.enhancer.nlp.coref" {
> >> >>     "isRepresentative" : true/false, // whether this token or chunk
> is
> >> the
> >> >> representative mention in the chain
> >> >>     "mentions" : [ { "type" : "Token", // type of element which
> refers
> >> to
> >> >> this token/chunk
> >> >>  "start": 123 , // start index of the mentioning element
> >> >>  "end": 130 // end index of the mentioning element
> >> >>                     }, ...
> >> >>                  ],
> >> >>     "class" : ""class" :
> >> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >> }
> >> >>
> >> >>
> >> >> 2. Dependency tree
> >> >>
> >> >> "stanbol.enhancer.nlp.dependency" : {
> >> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
> >> >> notation
> >> >>                        "dep" : 12, // type of relation - Stanbol NLP
> >> >> mapped value - ordinal number in enum Dependency
> >> >> "role" : "gov/dep", // whether this token is the depender or the
> >> dependee
> >> >>  "type" : "Token", // type of element with which this token is in
> >> relation
> >> >> "start" : 123, // start index of the relating token
> >> >>  "end" : 130 // end index of the relating token
> >> >> },
> >> >> ...
> >> >>  ]
> >> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> >> }
> >> >>
> >> >>
> >> >> 2013/9/2 Rupert Westenthaler <[email protected]>
> >> >>
> >> >>> Hi Cristian,
> >> >>>
> >> >>> let me provide some feedback to your proposals:
> >> >>>
> >> >>> ### Referring other Spans
> >> >>>
> >> >>> Both suggested annotations require to link other spans (Sentence,
> >> >>> Chunk or Token). For that we should introduce a JSON element used
> for
> >> >>> referring those elements and use it for all usages.
> >> >>>
> >> >>> In the java model this would allow you to have a reference to the
> >> >>> other Span (Sentence, Chunk, Token). In the serialized form you
> would
> >> >>> have JSON elements with the "type", "start" and "end" attributes as
> >> >>> those three uniquely identify any span.
> >> >>>
> >> >>> Here an example based on the "mention" attribute as defined by the
> >> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >>>
> >> >>>     ...
> >> >>>     "mentions" : [ {
> >> >>>         "type" : "Token",
> >> >>>         "start": 123 ,
> >> >>>         "end": 130 } ,{
> >> >>>         "type" : "Token",
> >> >>>         "start": 157 ,
> >> >>>         "end": 165 }],
> >> >>>     ...
> >> >>>
> >> >>> Similar token links in
> >> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should
> also
> >> >>> use this model.
> >> >>>
> >> >>> ### Usage of Controlled Vocabularies
> >> >>>
> >> >>> In addition the DependencyTag also seams to use a controlled
> >> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
> >> >>> NLP module tries to define those in some kind of Ontology. For POS
> >> >>> tags we use OLIA ontology [1]. This is important as most NLP
> >> >>> frameworks will use different strings and we need to unify those to
> >> >>> commons IDs so that component that consume those data do not depend
> on
> >> >>> a specific NLP tool.
> >> >>>
> >> >>> Because the usage of Ontologies within Java is not well supported.
> The
> >> >>> Stanbol NLP module defines Java Enumerations for those Ontologies
> such
> >> >>> as the POS type enumeration [2].
> >> >>>
> >> >>> Both the Java Model as well as the JSON serialization do support
> both
> >> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped
> >> >>> concept. In the Java API via two different methods and in the JSON
> >> >>> serialization via two separate keys.
> >> >>>
> >> >>> To make this more clear here an example for a POS annotation of a
> >> proper
> >> >>> noun.
> >> >>>
> >> >>>     "stanbol.enhancer.nlp.pos" : {
> >> >>>         "tag" : "PN",
> >> >>>         "pos" : 53,
> >> >>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
> >> >>>         "prob" : 0.95
> >> >>>     }
> >> >>>
> >> >>> where
> >> >>>
> >> >>>     "tag" : "PN"
> >> >>>
> >> >>> is the lexical form as used by the NLP tool and
> >> >>>
> >> >>>     "pos" : 53
> >> >>>
> >> >>> refers to the ordinal number of the entry "ProperNoun" in the POS
> >> >>> enumeration
> >> >>>
> >> >>> IMO the "type" property of DependencyTag should use a similar
> design.
> >> >>>
> >> >>> best
> >> >>> Rupert
> >> >>>
> >> >>> [1] http://olia.nlp2rdf.org/
> >> >>> [2]
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
> >> >>>
> >> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
> >> >>> <[email protected]> wrote:
> >> >>> > Sorry, pressed sent too soon :).
> >> >>> >
> >> >>> > Continued :
> >> >>> >
> >> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4,
> Tom-3),
> >> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
> >> >>> >
> >> >>> > Given this, we can have for each "Token" an additional dependency
> >> >>> > annotation :
> >> >>> >
> >> >>> > "stanbol.enhancer.nlp.dependency" : {
> >> >>> > "tag" : //is it necessary?
> >> >>> > "relations" : [ { "type" : "nsubj", //type of relation
> >> >>> >   "role" : "gov/dep", //whether it is depender or the dependee
> >> >>> >   "dependencyValue" : "met", // the word with which the token has
> a
> >> >>> relation
> >> >>> >   "dependencyIndexInSentence" : "2" //the index of the dependency
> in
> >> the
> >> >>> > current sentence
> >> >>> > }
> >> >>> > ...
> >> >>> > ]
> >> >>> >                 "class" :
> >> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
> >> >>> >         }
> >> >>> >
> >> >>> > 2013/9/1 Cristian Petroaca <[email protected]>
> >> >>> >
> >> >>> >> Related to the Stanford Dependency Tree Feature, this is the way
> the
> >> >>> >> output from the tool looks like for this sentence : "Mary and Tom
> >> met
> >> >>> Danny
> >> >>> >> today" :
> >> >>> >>
> >> >>> >>
> >> >>> >> 2013/8/30 Cristian Petroaca <[email protected]>
> >> >>> >>
> >> >>> >>> Hi Rupert,
> >> >>> >>>
> >> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP
> >> Server
> >> >>> and
> >> >>> >>> the coref module I'm thinking I can represent the coreference
> >> >>> information
> >> >>> >>> this way:
> >> >>> >>> Each "Token" or "Chunk" will contain an additional coref
> annotation
> >> >>> with
> >> >>> >>> the following structure :
> >> >>> >>>
> >> >>> >>> "stanbol.enhancer.nlp.coref" {
> >> >>> >>>     "tag" : //does this need to exist?
> >> >>> >>>     "isRepresentative" : true/false, // whether this token or
> >> chunk is
> >> >>> >>> the representative mention in the chain
> >> >>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which
> the
> >> >>> mention
> >> >>> >>> is found
> >> >>> >>>                            "startWord" : 2 //the first word
> making
> >> up
> >> >>> the
> >> >>> >>> mention
> >> >>> >>>                            "endWord" : 3 //the last word making
> up
> >> the
> >> >>> >>> mention
> >> >>> >>>                          }, ...
> >> >>> >>>                        ],
> >> >>> >>>     "class" : ""class" :
> >> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> >> >>> >>> }
> >> >>> >>>
> >> >>> >>> The CorefTag should resemble this model.
> >> >>> >>>
> >> >>> >>> What do you think?
> >> >>> >>>
> >> >>> >>> Cristian
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> 2013/8/24 Rupert Westenthaler <[email protected]>
> >> >>> >>>
> >> >>> >>>> Hi Cristian,
> >> >>> >>>>
> >> >>> >>>> you can not directly call StanfordNLP components from Stanbol,
> but
> >> >>> you
> >> >>> >>>> have to extend the RESTful service to include the information
> you
> >> >>> >>>> need. The main reason for that is that the license of
> StanfordNLP
> >> is
> >> >>> >>>> not compatible with the Apache Software License. So Stanbol can
> >> not
> >> >>> >>>> directly link to the StanfordNLP API.
> >> >>> >>>>
> >> >>> >>>> You will need to
> >> >>> >>>>
> >> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}>
> >> class
> >> >>> >>>> in the o.a.s.enhancer.nlp module
> >> >>> >>>> 2. add JSON parsing and serialization support for this tag to
> the
> >> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an
> >> example)
> >> >>> >>>>
> >> >>> >>>> As (1) would be necessary anyway the only additional thing you
> >> need
> >> >>> to
> >> >>> >>>> develop is (2). After that you can add {yourTag} instance to
> the
> >> >>> >>>> AnalyzedText in the StanfornNLP integration. The
> >> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
> >> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have
> >> access
> >> >>> >>>> to your annotations.
> >> >>> >>>>
> >> >>> >>>> If you have a design for {yourTag} - the model you would like
> to
> >> use
> >> >>> >>>> to represent your data - I can help with (1) and (2).
> >> >>> >>>>
> >> >>> >>>> best
> >> >>> >>>> Rupert
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
> >> >>> >>>> <[email protected]> wrote:
> >> >>> >>>> > Hi Rupert,
> >> >>> >>>> >
> >> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp
> >> project I
> >> >>> see
> >> >>> >>>> that
> >> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine
> but
> >> >>> rather
> >> >>> >>>> it
> >> >>> >>>> > is used directly in a Jetty Server instance. How does that
> fit
> >> >>> into the
> >> >>> >>>> > Stanbol stack? For example how can I call the
> >> StanfordNlpAnalyzer's
> >> >>> >>>> routine
> >> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
> >> >>> Stanbol
> >> >>> >>>> stack?
> >> >>> >>>> >
> >> >>> >>>> > Thanks,
> >> >>> >>>> > Cristian
> >> >>> >>>> >
> >> >>> >>>> >
> >> >>> >>>> > 2013/8/12 Rupert Westenthaler <[email protected]
> >
> >> >>> >>>> >
> >> >>> >>>> >> Hi Cristian,
> >> >>> >>>> >>
> >> >>> >>>> >> Sorry for the late response, but I was offline for the last
> two
> >> >>> weeks
> >> >>> >>>> >>
> >> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
> >> >>> >>>> >> <[email protected]> wrote:
> >> >>> >>>> >> > Hi Rupert,
> >> >>> >>>> >> >
> >> >>> >>>> >> > After doing some tests it seems that the Stanford NLP
> >> >>> coreference
> >> >>> >>>> module
> >> >>> >>>> >> is
> >> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to
> >> extend
> >> >>> >>>> Stanford
> >> >>> >>>> >> > NLP to add coreference there.
> >> >>> >>>> >>
> >> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol
> >> codebase
> >> >>> >>>> >> because the licenses are not compatible.
> >> >>> >>>> >>
> >> >>> >>>> >> You can find the Stanford NLP integration on
> >> >>> >>>> >>
> >> >>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
> >> >>> >>>> >>
> >> >>> >>>> >> just create a fork and send pull requests.
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> > Could you add the necessary projects on the branch? And
> also
> >> >>> remove
> >> >>> >>>> the
> >> >>> >>>> >> > Open NLP ones?
> >> >>> >>>> >> >
> >> >>> >>>> >>
> >> >>> >>>> >> Currently the branch
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >> >>> >>>> >>
> >> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO
> those
> >> >>> should
> >> >>> >>>> >> be enough for adding coreference support.
> >> >>> >>>> >>
> >> >>> >>>> >> IMO you will need to
> >> >>> >>>> >>
> >> >>> >>>> >> * add an model for representing coreference to the nlp
> module
> >> >>> >>>> >> * add parsing and serializing support to the nlp-json module
> >> >>> >>>> >> * add the implementation to your fork of the
> >> stanbol-stanfordnlp
> >> >>> >>>> project
> >> >>> >>>> >>
> >> >>> >>>> >> best
> >> >>> >>>> >> Rupert
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> > Thanks,
> >> >>> >>>> >> > Cristian
> >> >>> >>>> >> >
> >> >>> >>>> >> >
> >> >>> >>>> >> > 2013/7/5 Rupert Westenthaler <
> [email protected]>
> >> >>> >>>> >> >
> >> >>> >>>> >> >> Hi Cristian,
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> I created the branch at
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module.
> Let me
> >> >>> know
> >> >>> >>>> if
> >> >>> >>>> >> >> you would like to have more
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> best
> >> >>> >>>> >> >> Rupert
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
> >> >>> >>>> >> >> <[email protected]> wrote:
> >> >>> >>>> >> >> > Hi Rupert,
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > I created jiras :
> >> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
> >> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133.
> The
> >> >>> >>>> original one
> >> >>> >>>> >> in
> >> >>> >>>> >> >> > dependent upon these.
> >> >>> >>>> >> >> > Please let me know when I can start using the branch.
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > Thanks,
> >> >>> >>>> >> >> > Cristian
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca <
> [email protected]>
> >> >>> >>>> >> >> >
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
> >> >>> [email protected]>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
> >> >>> >>>> >> >> >>> <[email protected]> wrote:
> >> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford
> in my
> >> >>> >>>> previous
> >> >>> >>>> >> >> e-mail.
> >> >>> >>>> >> >> >>> By
> >> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
> >> >>> dependency
> >> >>> >>>> trees?
> >> >>> >>>> >> >> >>> >
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated
> into
> >> >>> >>>> Stanbol,
> >> >>> >>>> >> I'll
> >> >>> >>>> >> >> >> take a look at how I can extend its integration to
> >> include
> >> >>> the
> >> >>> >>>> >> >> dependency
> >> >>> >>>> >> >> >> tree feature.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>  >
> >> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <
> >> [email protected]
> >> >>> >
> >> >>> >>>> >> >> >>> >
> >> >>> >>>> >> >> >>> >> Hi Rupert,
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> I created jira
> >> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
> >> >>> >>>> >> >> >>> >> As you suggested I would start with extending the
> >> >>> Stanford
> >> >>> >>>> NLP
> >> >>> >>>> >> with
> >> >>> >>>> >> >> >>> >> co-reference resolution but I think also with
> >> dependency
> >> >>> >>>> trees
> >> >>> >>>> >> >> because
> >> >>> >>>> >> >> >>> I
> >> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and
> the
> >> >>> object
> >> >>> >>>> >> that it
> >> >>> >>>> >> >> >>> >> affects, right?
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API
> in
> >> >>> Stanbol
> >> >>> >>>> for
> >> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I
> proceed
> >> with
> >> >>> >>>> this?
> >> >>> >>>> >> Do I
> >> >>> >>>> >> >> >>> create
> >> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After
> >> that
> >> >>> can I
> >> >>> >>>> >> start
> >> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when
> I'm
> >> >>> done
> >> >>> >>>> I'll
> >> >>> >>>> >> send
> >> >>> >>>> >> >> >>> you
> >> >>> >>>> >> >> >>> >> guys the patch fo review?
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for
> >> adding
> >> >>> >>>> support
> >> >>> >>>> >> >> >>> for "dependency trees" and the other for
> "co-reference"
> >> >>> >>>> support. You
> >> >>> >>>> >> >> >>> should also define "depends on" relations between
> >> >>> STANBOL-1121
> >> >>> >>>> and
> >> >>> >>>> >> >> >>> those two new issues.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> Sub-task could also work, but as adding those
> features
> >> >>> would
> >> >>> >>>> be also
> >> >>> >>>> >> >> >>> interesting for other things I would rather define
> them
> >> as
> >> >>> >>>> separate
> >> >>> >>>> >> >> >>> issues.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >> 2 New Features connected with the original jira it is
> >> then.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>> If you would prefer to work in an own branch please
> tell
> >> >>> me.
> >> >>> >>>> This
> >> >>> >>>> >> >> >>> could have the advantage that patches would not be
> >> >>> affected by
> >> >>> >>>> >> changes
> >> >>> >>>> >> >> >>> in the trunk.
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> Yes, a separate branch sounds good.
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >> best
> >> >>> >>>> >> >> >>> Rupert
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> >> Regards,
> >> >>> >>>> >> >> >>> >> Cristian
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
> >> >>> >>>> [email protected]>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian
> Petroaca
> >> >>> >>>> >> >> >>> >>> <[email protected]> wrote:
> >> >>> >>>> >> >> >>> >>> > Hi Rupert,
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Agreed on the
> >> >>> >>>> >> >> >>>
> >> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
> >> >>> >>>> >> >> >>> >>> > data structure.
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in
> order
> >> to
> >> >>> >>>> >> encapsulate
> >> >>> >>>> >> >> this
> >> >>> >>>> >> >> >>> >>> > information and establish the goals and these
> >> initial
> >> >>> >>>> steps
> >> >>> >>>> >> >> towards
> >> >>> >>>> >> >> >>> >>> these
> >> >>> >>>> >> >> >>> >>> > goals?
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be
> >> great.
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create
> some
> >> >>> design
> >> >>> >>>> >> >> documents
> >> >>> >>>> >> >> >>> that
> >> >>> >>>> >> >> >>> >>> > need to be reviewed?
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> Usually it is the best to write design related
> text
> >> >>> >>>> directly in
> >> >>> >>>> >> >> JIRA
> >> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us
> >> later
> >> >>> to
> >> >>> >>>> use
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> >>> text directly for the documentation on the
> Stanbol
> >> >>> Webpage.
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> best
> >> >>> >>>> >> >> >>> >>> Rupert
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > Regards,
> >> >>> >>>> >> >> >>> >>> > Cristian
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
> >> >>> >>>> [email protected]>
> >> >>> >>>> >> >> >>> >>> >
> >> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian
> >> Petroaca
> >> >>> >>>> >> >> >>> >>> >> <[email protected]> wrote:
> >> >>> >>>> >> >> >>> >>> >> > HI Rupert,
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed
> >> suggestions.
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
> >> >>> >>>> >> [email protected]>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> really interesting use case!
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some
> >> suggestions
> >> >>> on
> >> >>> >>>> how
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> could
> >> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly
> based on
> >> >>> >>>> experiences
> >> >>> >>>> >> >> and
> >> >>> >>>> >> >> >>> >>> lessons
> >> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we
> >> built an
> >> >>> >>>> >> information
> >> >>> >>>> >> >> >>> system
> >> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
> >> >>> Project
> >> >>> >>>> >> excluded
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
> >> >>> (because
> >> >>> >>>> the
> >> >>> >>>> >> >> Olympic
> >> >>> >>>> >> >> >>> >>> >> >> Information System was already providing
> event
> >> >>> data
> >> >>> >>>> as XML
> >> >>> >>>> >> >> >>> messages)
> >> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this
> >> system
> >> >>> >>>> where very
> >> >>> >>>> >> >> >>> similar
> >> >>> >>>> >> >> >>> >>> as
> >> >>> >>>> >> >> >>> >>> >> >> the one described by your use case.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
> >> >>> relations,
> >> >>> >>>> but a
> >> >>> >>>> >> >> formal
> >> >>> >>>> >> >> >>> >>> >> >> representation of the situation described
> by
> >> the
> >> >>> >>>> text. So
> >> >>> >>>> >> >> lets
> >> >>> >>>> >> >> >>> >>> assume
> >> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
> >> >>> Situation)
> >> >>> >>>> >> >> described
> >> >>> >>>> >> >> >>> in
> >> >>> >>>> >> >> >>> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives
> some
> >> >>> >>>> advices on
> >> >>> >>>> >> >> how to
> >> >>> >>>> >> >> >>> >>> model
> >> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling
> >> this
> >> >>> >>>> >> >> Participation:
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> where ..
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants):
> Endurants
> >> do
> >> >>> have
> >> >>> >>>> an
> >> >>> >>>> >> >> >>> identity so
> >> >>> >>>> >> >> >>> >>> we
> >> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
> >> >>> referenced
> >> >>> >>>> by a
> >> >>> >>>> >> >> >>> setting.
> >> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical,
> >> non-physical as
> >> >>> >>>> well as
> >> >>> >>>> >> >> >>> >>> >> >> social-objects.
> >> >>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):
>  Perdurants
> >> >>> are
> >> >>> >>>> >> entities
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
> >> >>> Activities ...
> >> >>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time
> indexed
> >> >>> >>>> relation
> >> >>> >>>> >> where
> >> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define
> some
> >> >>> >>>> intermediate
> >> >>> >>>> >> >> >>> resources
> >> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary
> >> relations.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really
> handy
> >> to
> >> >>> >>>> define
> >> >>> >>>> >> one
> >> >>> >>>> >> >> >>> resource
> >> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I
> >> would
> >> >>> >>>> call
> >> >>> >>>> >> this
> >> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
> >> >>> >>>> sub-concept to
> >> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement
> >> about
> >> >>> the
> >> >>> >>>> >> extracted
> >> >>> >>>> >> >> >>> >>> Setting
> >> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation
> to
> >> it.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
> >> >>> annotate
> >> >>> >>>> that
> >> >>> >>>> >> >> >>> Endurant is
> >> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
> >> >>> >>>> >> >> >>> fise:SettingAnnotation).
> >> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by
> existing
> >> >>> >>>> >> >> fise:TextAnnotaion
> >> >>> >>>> >> >> >>> (the
> >> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation
> (suggested
> >> >>> >>>> Entities).
> >> >>> >>>> >> >> >>> Basically
> >> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow
> an
> >> >>> >>>> >> >> EnhancementEngine
> >> >>> >>>> >> >> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
> >> >>> different
> >> >>> >>>> >> >> sentences) do
> >> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as
> participating
> >> in
> >> >>> the
> >> >>> >>>> >> Setting.
> >> >>> >>>> >> >> In
> >> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the
> >> dc:type
> >> >>> >>>> property
> >> >>> >>>> >> >> >>> (similar
> >> >>> >>>> >> >> >>> >>> as
> >> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the
> >> role(s)
> >> >>> of
> >> >>> >>>> an
> >> >>> >>>> >> >> >>> participant
> >> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally
> performs
> >> an
> >> >>> >>>> action)
> >> >>> >>>> >> Cause
> >> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide),
> Patient (a
> >> >>> >>>> passive
> >> >>> >>>> >> role
> >> >>> >>>> >> >> in
> >> >>> >>>> >> >> >>> an
> >> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an
> process)),
> >> but
> >> >>> I am
> >> >>> >>>> >> >> wondering
> >> >>> >>>> >> >> >>> if
> >> >>> >>>> >> >> >>> >>> one
> >> >>> >>>> >> >> >>> >>> >> >> could extract those information.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to
> >> annotate a
> >> >>> >>>> >> Perdurant
> >> >>> >>>> >> >> in
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
> >> >>> >>>> fise:OccurrentAnnotation can
> >> >>> >>>> >> >> link
> >> >>> >>>> >> >> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the
> >> text
> >> >>> >>>> defining
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
> >> >>> >>>> suggesting
> >> >>> >>>> >> well
> >> >>> >>>> >> >> >>> known
> >> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election
> >> in a
> >> >>> >>>> country,
> >> >>> >>>> >> or
> >> >>> >>>> >> >> an
> >> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
> >> >>> fise:OccurrentAnnotation
> >> >>> >>>> can
> >> >>> >>>> >> >> define
> >> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
> >> >>> >>>> fise:ParticipantAnnotation. In
> >> >>> >>>> >> >> this
> >> >>> >>>> >> >> >>> case
> >> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant
> (the
> >> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in
> this
> >> >>> >>>> Perturant
> >> >>> >>>> >> (the
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences
> are
> >> >>> >>>> temporal
> >> >>> >>>> >> >> indexed
> >> >>> >>>> >> >> >>> this
> >> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties
> for
> >> >>> >>>> defining the
> >> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure
> makes a
> >> >>> lot of
> >> >>> >>>> sense
> >> >>> >>>> >> >> with
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> remark
> >> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always
> >> extract
> >> >>> the
> >> >>> >>>> date
> >> >>> >>>> >> >> for a
> >> >>> >>>> >> >> >>> >>> given
> >> >>> >>>> >> >> >>> >>> >> > setting(situation).
> >> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in
> >> which
> >> >>> the
> >> >>> >>>> >> object
> >> >>> >>>> >> >> upon
> >> >>> >>>> >> >> >>> >>> which
> >> >>> >>>> >> >> >>> >>> >> the
> >> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
> >> >>> transitory
> >> >>> >>>> >> object (
> >> >>> >>>> >> >> >>> such
> >> >>> >>>> >> >> >>> >>> as an
> >> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another
> Endurant.
> >> For
> >> >>> >>>> example
> >> >>> >>>> >> we
> >> >>> >>>> >> >> can
> >> >>> >>>> >> >> >>> >>> have
> >> >>> >>>> >> >> >>> >>> >> the
> >> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
> >> >>> Endurant
> >> >>> >>>> (
> >> >>> >>>> >> >> Subject )
> >> >>> >>>> >> >> >>> >>> which
> >> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
> >> >>> >>>> Eundurant,
> >> >>> >>>> >> namely
> >> >>> >>>> >> >> >>> >>> "Irak".
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq
> >> the
> >> >>> >>>> Patient.
> >> >>> >>>> >> Both
> >> >>> >>>> >> >> >>> are
> >> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be
> the
> >> >>> >>>> Perdurant. So
> >> >>> >>>> >> >> >>> ideally
> >> >>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation"
> with:
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with
> the
> >> >>> dc:type
> >> >>> >>>> >> >> caos:Agent,
> >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA"
> and a
> >> >>> >>>> >> >> >>> fise:EntityAnnotation
> >> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
> >> >>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with
> the
> >> >>> dc:type
> >> >>> >>>> >> >> >>> caos:Patient,
> >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak"
> and a
> >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
> >> >>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades"
> with
> >> the
> >> >>> >>>> dc:type
> >> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a
> fise:TextAnnotation
> >> for
> >> >>> >>>> "invades"
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the
> Subject
> >> >>> and
> >> >>> >>>> the
> >> >>> >>>> >> Object
> >> >>> >>>> >> >> >>> come
> >> >>> >>>> >> >> >>> >>> into
> >> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would
> have a
> >> >>> >>>> >> dc:"property"
> >> >>> >>>> >> >> >>> where
> >> >>> >>>> >> >> >>> >>> the
> >> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in
> >> noun
> >> >>> >>>> form. For
> >> >>> >>>> >> >> >>> example
> >> >>> >>>> >> >> >>> >>> take
> >> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You
> >> would
> >> >>> have
> >> >>> >>>> the
> >> >>> >>>> >> >> "USA"
> >> >>> >>>> >> >> >>> >>> Entity
> >> >>> >>>> >> >> >>> >>> >> with
> >> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object
> "Irak".
> >> The
> >> >>> >>>> Endurant
> >> >>> >>>> >> >> would
> >> >>> >>>> >> >> >>> >>> have as
> >> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are
> verbs
> >> >>> which
> >> >>> >>>> link
> >> >>> >>>> >> it
> >> >>> >>>> >> >> to
> >> >>> >>>> >> >> >>> an
> >> >>> >>>> >> >> >>> >>> >> Object.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> As explained above you would have a
> >> >>> >>>> fise:OccurrentAnnotation
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that
> >> the
> >> >>> >>>> activity
> >> >>> >>>> >> >> >>> mention in
> >> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
> >> >>> >>>> >> >> >>> fise:TextAnnotation. If
> >> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks
> that
> >> >>> defines
> >> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation
> >> could
> >> >>> >>>> also link
> >> >>> >>>> >> >> to an
> >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> best
> >> >>> >>>> >> >> >>> >>> >> Rupert
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > ### Consuming the data:
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
> >> >>> >>>> use-cases as
> >> >>> >>>> >> >> >>> described
> >> >>> >>>> >> >> >>> >>> by
> >> >>> >>>> >> >> >>> >>> >> you.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
> >> >>> setting
> >> >>> >>>> level.
> >> >>> >>>> >> >> This
> >> >>> >>>> >> >> >>> can
> >> >>> >>>> >> >> >>> >>> be
> >> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
> >> >>> >>>> fise:ParticipantAnnotation
> >> >>> >>>> >> as
> >> >>> >>>> >> >> >>> well as
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a
> >> setting.
> >> >>> BTW
> >> >>> >>>> this
> >> >>> >>>> >> was
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic
> >> search. It
> >> >>> >>>> allows
> >> >>> >>>> >> >> >>> queries for
> >> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities
> e.g.
> >> you
> >> >>> >>>> could
> >> >>> >>>> >> filter
> >> >>> >>>> >> >> >>> for
> >> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
> >> >>> >>>> activities:Arrested and
> >> >>> >>>> >> a
> >> >>> >>>> >> >> >>> specific
> >> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this
> >> approach
> >> >>> >>>> you will
> >> >>> >>>> >> >> get
> >> >>> >>>> >> >> >>> >>> results
> >> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated
> >> and
> >> >>> an
> >> >>> >>>> other
> >> >>> >>>> >> >> person
> >> >>> >>>> >> >> >>> was
> >> >>> >>>> >> >> >>> >>> >> >> arrested.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
> >> >>> enhancement
> >> >>> >>>> >> results
> >> >>> >>>> >> >> on
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow
> to
> >> a
> >> >>> much
> >> >>> >>>> >> higher
> >> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
> >> >>> correctly
> >> >>> >>>> answer
> >> >>> >>>> >> >> the
> >> >>> >>>> >> >> >>> query
> >> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am
> wondering
> >> if
> >> >>> the
> >> >>> >>>> >> quality
> >> >>> >>>> >> >> of
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for
> >> this. I
> >> >>> >>>> have
> >> >>> >>>> >> also
> >> >>> >>>> >> >> >>> doubts
> >> >>> >>>> >> >> >>> >>> if
> >> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using
> semantic
> >> >>> >>>> indexing to
> >> >>> >>>> >> >> Apache
> >> >>> >>>> >> >> >>> Solr
> >> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
> >> >>> results
> >> >>> >>>> in a
> >> >>> >>>> >> >> >>> TripleStore
> >> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by
> >> YAGO
> >> >>> [3]
> >> >>> >>>> is
> >> >>> >>>> >> also
> >> >>> >>>> >> >> very
> >> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter
> 7
> >> >>> SPOTL(X)
> >> >>> >>>> >> >> >>> >>> Representation).
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
> >> >>> Entities
> >> >>> >>>> >> >> (especially
> >> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on
> Settings
> >> >>> >>>> extracted
> >> >>> >>>> >> form
> >> >>> >>>> >> >> >>> >>> Documents.
> >> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants
> are
> >> >>> >>>> temporal
> >> >>> >>>> >> >> indexed.
> >> >>> >>>> >> >> >>> That
> >> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a
> >> knowledge
> >> >>> >>>> base they
> >> >>> >>>> >> >> might
> >> >>> >>>> >> >> >>> >>> still
> >> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching
> and
> >> >>> >>>> refinement
> >> >>> >>>> >> of
> >> >>> >>>> >> >> such
> >> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to
> be
> >> >>> >>>> critical for
> >> >>> >>>> >> a
> >> >>> >>>> >> >> >>> System
> >> >>> >>>> >> >> >>> >>> >> >> like described in your use-case.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
> >> >>> Petroaca
> >> >>> >>>> >> >> >>> >>> >> >> <[email protected]> wrote:
> >> >>> >>>> >> >> >>> >>> >> >> >
> >> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am
> new
> >> >>> in the
> >> >>> >>>> >> field
> >> >>> >>>> >> >> of
> >> >>> >>>> >> >> >>> >>> semantic
> >> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about
> >> them
> >> >>> in
> >> >>> >>>> the
> >> >>> >>>> >> last
> >> >>> >>>> >> >> 4-5
> >> >>> >>>> >> >> >>> >>> >> >> months.Having
> >> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of
> >> what
> >> >>> is
> >> >>> >>>> a good
> >> >>> >>>> >> >> >>> approach
> >> >>> >>>> >> >> >>> >>> to
> >> >>> >>>> >> >> >>> >>> >> >> solve
> >> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of
> papers
> >> on
> >> >>> the
> >> >>> >>>> >> internet
> >> >>> >>>> >> >> >>> which
> >> >>> >>>> >> >> >>> >>> >> describe
> >> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as :
> named
> >> >>> entity
> >> >>> >>>> >> >> >>> recognition,
> >> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
> >> >>> others.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently
> >> only
> >> >>> >>>> supports
> >> >>> >>>> >> >> >>> sentence
> >> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging,
> >> Chunking,
> >> >>> NER
> >> >>> >>>> and
> >> >>> >>>> >> >> lemma.
> >> >>> >>>> >> >> >>> >>> support
> >> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency
> >> trees
> >> >>> is
> >> >>> >>>> >> currently
> >> >>> >>>> >> >> >>> >>> missing.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with
> >> Stanbol
> >> >>> [4].
> >> >>> >>>> At
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> moment
> >> >>> >>>> >> >> >>> >>> it
> >> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already
> work
> >> to
> >> >>> >>>> include
> >> >>> >>>> >> the
> >> >>> >>>> >> >> >>> other
> >> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework
> that
> >> is
> >> >>> >>>> already
> >> >>> >>>> >> >> >>> integrated
> >> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane
> >> [6].
> >> >>> But
> >> >>> >>>> note
> >> >>> >>>> >> >> that
> >> >>> >>>> >> >> >>> for
> >> >>> >>>> >> >> >>> >>> all
> >> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
> >> >>> >>>> co-reference
> >> >>> >>>> >> and
> >> >>> >>>> >> >> >>> >>> dependency
> >> >>> >>>> >> >> >>> >>> >> >> trees.
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can
> implement
> >> a
> >> >>> first
> >> >>> >>>> >> >> prototype
> >> >>> >>>> >> >> >>> by
> >> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
> >> >>> available
> >> >>> >>>> -
> >> >>> >>>> >> Chunks
> >> >>> >>>> >> >> >>> (e.g.
> >> >>> >>>> >> >> >>> >>> >> >> Noun phrases).
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a
> feature
> >> >>> like
> >> >>> >>>> >> Relation
> >> >>> >>>> >> >> >>> >>> extraction
> >> >>> >>>> >> >> >>> >>> >> > would be implemented as an
> EnhancementEngine?
> >> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
> >> >>> >>>> co-reference
> >> >>> >>>> >> >> >>> resolution
> >> >>> >>>> >> >> >>> >>> tool
> >> >>> >>>> >> >> >>> >>> >> > integration into Stanbol?
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an
> EnhancementEngine.
> >> But
> >> >>> >>>> before
> >> >>> >>>> >> we
> >> >>> >>>> >> >> can
> >> >>> >>>> >> >> >>> >>> >> build such an engine we would need to
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
> >> >>> >>>> Annotations for
> >> >>> >>>> >> >> >>> >>> co-reference
> >> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing
> for
> >> >>> those
> >> >>> >>>> >> >> annotation
> >> >>> >>>> >> >> >>> so
> >> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can
> provide
> >> >>> >>>> >> co-reference
> >> >>> >>>> >> >> >>> >>> >> information
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2
> aspects:
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
> >> >>> encapsulate
> >> >>> >>>> the
> >> >>> >>>> >> >> extracted
> >> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at
> Dolce.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
> >> >>> structure to
> >> >>> >>>> >> >> represent
> >> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also
> >> successfully
> >> >>> >>>> extract
> >> >>> >>>> >> >> such
> >> >>> >>>> >> >> >>> >>> >> information form processed texts.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> I would start with
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation}
> >> (multiple
> >> >>> if
> >> >>> >>>> there
> >> >>> >>>> >> are
> >> >>> >>>> >> >> >>> more
> >> >>> >>>> >> >> >>> >>> >> suggestions)
> >> >>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
> >> >>> >>>> >> fise:Instrument,
> >> >>> >>>> >> >> >>> >>> fise:Cause
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
> >> >>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
> >> >>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
> >> >>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we
> can
> >> add
> >> >>> >>>> more
> >> >>> >>>> >> >> >>> structure to
> >> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about
> >> using
> >> >>> an
> >> >>> >>>> own
> >> >>> >>>> >> >> namespace
> >> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation
> structure.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be
> >> integrated
> >> >>> into
> >> >>> >>>> >> >> Stanbol.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and
> >> configure a
> >> >>> >>>> >> enhancement
> >> >>> >>>> >> >> >>> chain
> >> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> You should have a look at
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does
> a
> >> lot
> >> >>> of
> >> >>> >>>> things
> >> >>> >>>> >> >> with
> >> >>> >>>> >> >> >>> NLP
> >> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives
> >> (via
> >> >>> >>>> verbs) to
> >> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use
> >> explicit
> >> >>> >>>> dependency
> >> >>> >>>> >> >> trees
> >> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with
> >> Nouns,
> >> >>> >>>> Pronouns
> >> >>> >>>> >> and
> >> >>> >>>> >> >> >>> >>> >> Verbs.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a
> >> Java
> >> >>> >>>> >> >> representation
> >> >>> >>>> >> >> >>> of
> >> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
> >> >>> fise:EntityAnnotation
> >> >>> >>>> [2].
> >> >>> >>>> >> >> >>> Something
> >> >>> >>>> >> >> >>> >>> >> similar will also be required by the
> >> >>> >>>> EventExtractionEngine
> >> >>> >>>> >> for
> >> >>> >>>> >> >> fast
> >> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating
> over
> >> the
> >> >>> >>>> >> Sentences of
> >> >>> >>>> >> >> >>> the
> >> >>> >>>> >> >> >>> >>> >> text.
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> best
> >> >>> >>>> >> >> >>> >>> >> Rupert
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> [1]
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
> >> >>> >>>> >> >> >>> >>> >> [2]
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>>
> >>
> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > Thanks
> >> >>> >>>> >> >> >>> >>> >> >
> >> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
> >> >>> >>>> >> >> >>> >>> >> >> best
> >> >>> >>>> >> >> >>> >>> >> >> Rupert
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >> >> --
> >> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
> >> >>> >>>> >> >> [email protected]
> >> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
> >> >>> >>>> >> >> >>> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>> >> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>> >> --
> >> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
> >> >>> >>>> >> [email protected]
> >> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
> >> >>> >>>> >> >> >>> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>> >>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>> --
> >> >>> >>>> >> >> >>> >>> | Rupert Westenthaler
> >> >>> >>>> [email protected]
> >> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
> >> >>> >>>> >> >> ++43-699-11108907
> >> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>> >>>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>> >>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>> --
> >> >>> >>>> >> >> >>> | Rupert Westenthaler
> >> >>> >>>> [email protected]
> >> >>> >>>> >> >> >>> | Bodenlehenstraße 11
> >> >>> >>>> ++43-699-11108907
> >> >>> >>>> >> >> >>> | A-5500 Bischofshofen
> >> >>> >>>> >> >> >>>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >>
> >> >>> >>>> >> >> --
> >> >>> >>>> >> >> | Rupert Westenthaler
> >> >>> [email protected]
> >> >>> >>>> >> >> | Bodenlehenstraße 11
> >> >>> >>>> ++43-699-11108907
> >> >>> >>>> >> >> | A-5500 Bischofshofen
> >> >>> >>>> >> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >>
> >> >>> >>>> >> --
> >> >>> >>>> >> | Rupert Westenthaler
> >> [email protected]
> >> >>> >>>> >> | Bodenlehenstraße 11
> >> >>> ++43-699-11108907
> >> >>> >>>> >> | A-5500 Bischofshofen
> >> >>> >>>> >>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> --
> >> >>> >>>> | Rupert Westenthaler
> [email protected]
> >> >>> >>>> | Bodenlehenstraße 11
> >> ++43-699-11108907
> >> >>> >>>> | A-5500 Bischofshofen
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> | Rupert Westenthaler             [email protected]
> >> >>> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >>> | A-5500 Bischofshofen
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: Relation extraction feature

Reply via email to