Ok, then to sum it up we would have : 1. Coref
"stanbol.enhancer.nlp.coref" { "isRepresentative" : true/false, // whether this token or chunk is the representative mention in the chain "mentions" : [ { "type" : "Token", // type of element which refers to this token/chunk "start": 123 , // start index of the mentioning element "end": 130 // end index of the mentioning element }, ... ], "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag" } 2. Dependency tree "stanbol.enhancer.nlp.dependency" : { "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP notation "dep" : 12, // type of relation - Stanbol NLP mapped value - ordinal number in enum Dependency "role" : "gov/dep", // whether this token is the depender or the dependee "type" : "Token", // type of element with which this token is in relation "start" : 123, // start index of the relating token "end" : 130 // end index of the relating token }, ... ] "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" } 2013/9/2 Rupert Westenthaler <rupert.westentha...@gmail.com> > Hi Cristian, > > let me provide some feedback to your proposals: > > ### Referring other Spans > > Both suggested annotations require to link other spans (Sentence, > Chunk or Token). For that we should introduce a JSON element used for > referring those elements and use it for all usages. > > In the java model this would allow you to have a reference to the > other Span (Sentence, Chunk, Token). In the serialized form you would > have JSON elements with the "type", "start" and "end" attributes as > those three uniquely identify any span. > > Here an example based on the "mention" attribute as defined by the > proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > > ... > "mentions" : [ { > "type" : "Token", > "start": 123 , > "end": 130 } ,{ > "type" : "Token", > "start": 157 , > "end": 165 }], > ... > > Similar token links in > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also > use this model. > > ### Usage of Controlled Vocabularies > > In addition the DependencyTag also seams to use a controlled > vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol > NLP module tries to define those in some kind of Ontology. For POS > tags we use OLIA ontology [1]. This is important as most NLP > frameworks will use different strings and we need to unify those to > commons IDs so that component that consume those data do not depend on > a specific NLP tool. > > Because the usage of Ontologies within Java is not well supported. The > Stanbol NLP module defines Java Enumerations for those Ontologies such > as the POS type enumeration [2]. > > Both the Java Model as well as the JSON serialization do support both > (1) the lexical tag as used by the NLP tool and (2) the mapped > concept. In the Java API via two different methods and in the JSON > serialization via two separate keys. > > To make this more clear here an example for a POS annotation of a proper > noun. > > "stanbol.enhancer.nlp.pos" : { > "tag" : "PN", > "pos" : 53, > "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag", > "prob" : 0.95 > } > > where > > "tag" : "PN" > > is the lexical form as used by the NLP tool and > > "pos" : 53 > > refers to the ordinal number of the entry "ProperNoun" in the POS > enumeration > > IMO the "type" property of DependencyTag should use a similar design. > > best > Rupert > > [1] http://olia.nlp2rdf.org/ > [2] > http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java > > On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Sorry, pressed sent too soon :). > > > > Continued : > > > > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3), > > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)] > > > > Given this, we can have for each "Token" an additional dependency > > annotation : > > > > "stanbol.enhancer.nlp.dependency" : { > > "tag" : //is it necessary? > > "relations" : [ { "type" : "nsubj", //type of relation > > "role" : "gov/dep", //whether it is depender or the dependee > > "dependencyValue" : "met", // the word with which the token has a > relation > > "dependencyIndexInSentence" : "2" //the index of the dependency in the > > current sentence > > } > > ... > > ] > > "class" : > > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" > > } > > > > 2013/9/1 Cristian Petroaca <cristian.petro...@gmail.com> > > > >> Related to the Stanford Dependency Tree Feature, this is the way the > >> output from the tool looks like for this sentence : "Mary and Tom met > Danny > >> today" : > >> > >> > >> 2013/8/30 Cristian Petroaca <cristian.petro...@gmail.com> > >> > >>> Hi Rupert, > >>> > >>> Ok, so after looking at the JSON output from the Stanford NLP Server > and > >>> the coref module I'm thinking I can represent the coreference > information > >>> this way: > >>> Each "Token" or "Chunk" will contain an additional coref annotation > with > >>> the following structure : > >>> > >>> "stanbol.enhancer.nlp.coref" { > >>> "tag" : //does this need to exist? > >>> "isRepresentative" : true/false, // whether this token or chunk is > >>> the representative mention in the chain > >>> "mentions" : [ { "sentenceNo" : 1 //the sentence in which the > mention > >>> is found > >>> "startWord" : 2 //the first word making up > the > >>> mention > >>> "endWord" : 3 //the last word making up the > >>> mention > >>> }, ... > >>> ], > >>> "class" : ""class" : > "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > >>> } > >>> > >>> The CorefTag should resemble this model. > >>> > >>> What do you think? > >>> > >>> Cristian > >>> > >>> > >>> 2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com> > >>> > >>>> Hi Cristian, > >>>> > >>>> you can not directly call StanfordNLP components from Stanbol, but you > >>>> have to extend the RESTful service to include the information you > >>>> need. The main reason for that is that the license of StanfordNLP is > >>>> not compatible with the Apache Software License. So Stanbol can not > >>>> directly link to the StanfordNLP API. > >>>> > >>>> You will need to > >>>> > >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class > >>>> in the o.a.s.enhancer.nlp module > >>>> 2. add JSON parsing and serialization support for this tag to the > >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example) > >>>> > >>>> As (1) would be necessary anyway the only additional thing you need to > >>>> develop is (2). After that you can add {yourTag} instance to the > >>>> AnalyzedText in the StanfornNLP integration. The > >>>> RestfulNlpAnalysisEngine will parse them from the response. All > >>>> engines executed after the RestfulNlpAnalysisEngine will have access > >>>> to your annotations. > >>>> > >>>> If you have a design for {yourTag} - the model you would like to use > >>>> to represent your data - I can help with (1) and (2). > >>>> > >>>> best > >>>> Rupert > >>>> > >>>> > >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca > >>>> <cristian.petro...@gmail.com> wrote: > >>>> > Hi Rupert, > >>>> > > >>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I > see > >>>> that > >>>> > the stanford nlp is not implemented as an EnhancementEngine but > rather > >>>> it > >>>> > is used directly in a Jetty Server instance. How does that fit into > the > >>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's > >>>> routine > >>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol > >>>> stack? > >>>> > > >>>> > Thanks, > >>>> > Cristian > >>>> > > >>>> > > >>>> > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com> > >>>> > > >>>> >> Hi Cristian, > >>>> >> > >>>> >> Sorry for the late response, but I was offline for the last two > weeks > >>>> >> > >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca > >>>> >> <cristian.petro...@gmail.com> wrote: > >>>> >> > Hi Rupert, > >>>> >> > > >>>> >> > After doing some tests it seems that the Stanford NLP coreference > >>>> module > >>>> >> is > >>>> >> > much more accurate than the Open NLP one.So I decided to extend > >>>> Stanford > >>>> >> > NLP to add coreference there. > >>>> >> > >>>> >> The Stanford NLP integration is not part of the Stanbol codebase > >>>> >> because the licenses are not compatible. > >>>> >> > >>>> >> You can find the Stanford NLP integration on > >>>> >> > >>>> >> https://github.com/westei/stanbol-stanfordnlp > >>>> >> > >>>> >> just create a fork and send pull requests. > >>>> >> > >>>> >> > >>>> >> > Could you add the necessary projects on the branch? And also > remove > >>>> the > >>>> >> > Open NLP ones? > >>>> >> > > >>>> >> > >>>> >> Currently the branch > >>>> >> > >>>> >> > >>>> >> > >>>> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >>>> >> > >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those > should > >>>> >> be enough for adding coreference support. > >>>> >> > >>>> >> IMO you will need to > >>>> >> > >>>> >> * add an model for representing coreference to the nlp module > >>>> >> * add parsing and serializing support to the nlp-json module > >>>> >> * add the implementation to your fork of the stanbol-stanfordnlp > >>>> project > >>>> >> > >>>> >> best > >>>> >> Rupert > >>>> >> > >>>> >> > >>>> >> > >>>> >> > Thanks, > >>>> >> > Cristian > >>>> >> > > >>>> >> > > >>>> >> > 2013/7/5 Rupert Westenthaler <rupert.westentha...@gmail.com> > >>>> >> > > >>>> >> >> Hi Cristian, > >>>> >> >> > >>>> >> >> I created the branch at > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> > >>>> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >>>> >> >> > >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me > know > >>>> if > >>>> >> >> you would like to have more > >>>> >> >> > >>>> >> >> best > >>>> >> >> Rupert > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca > >>>> >> >> <cristian.petro...@gmail.com> wrote: > >>>> >> >> > Hi Rupert, > >>>> >> >> > > >>>> >> >> > I created jiras : > >>>> https://issues.apache.org/jira/browse/STANBOL-1132and > >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The > >>>> original one > >>>> >> in > >>>> >> >> > dependent upon these. > >>>> >> >> > Please let me know when I can start using the branch. > >>>> >> >> > > >>>> >> >> > Thanks, > >>>> >> >> > Cristian > >>>> >> >> > > >>>> >> >> > > >>>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petro...@gmail.com> > >>>> >> >> > > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com > > > >>>> >> >> >> > >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca > >>>> >> >> >>> <cristian.petro...@gmail.com> wrote: > >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my > >>>> previous > >>>> >> >> e-mail. > >>>> >> >> >>> By > >>>> >> >> >>> > the way, does Open NLP have the ability to build > dependency > >>>> trees? > >>>> >> >> >>> > > >>>> >> >> >>> > >>>> >> >> >>> AFAIK OpenNLP does not provide this feature. > >>>> >> >> >>> > >>>> >> >> >> > >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into > >>>> Stanbol, > >>>> >> I'll > >>>> >> >> >> take a look at how I can extend its integration to include > the > >>>> >> >> dependency > >>>> >> >> >> tree feature. > >>>> >> >> >> > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >> > > >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com> > >>>> >> >> >>> > > >>>> >> >> >>> >> Hi Rupert, > >>>> >> >> >>> >> > >>>> >> >> >>> >> I created jira > >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121. > >>>> >> >> >>> >> As you suggested I would start with extending the > Stanford > >>>> NLP > >>>> >> with > >>>> >> >> >>> >> co-reference resolution but I think also with dependency > >>>> trees > >>>> >> >> because > >>>> >> >> >>> I > >>>> >> >> >>> >> also need to know the Subject of the sentence and the > object > >>>> >> that it > >>>> >> >> >>> >> affects, right? > >>>> >> >> >>> >> > >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in > Stanbol > >>>> for > >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with > >>>> this? > >>>> >> Do I > >>>> >> >> >>> create > >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that > can I > >>>> >> start > >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm > done > >>>> I'll > >>>> >> send > >>>> >> >> >>> you > >>>> >> >> >>> >> guys the patch fo review? > >>>> >> >> >>> >> > >>>> >> >> >>> > >>>> >> >> >>> I would create two "New Feature" type Issues one for adding > >>>> support > >>>> >> >> >>> for "dependency trees" and the other for "co-reference" > >>>> support. You > >>>> >> >> >>> should also define "depends on" relations between > STANBOL-1121 > >>>> and > >>>> >> >> >>> those two new issues. > >>>> >> >> >>> > >>>> >> >> >>> Sub-task could also work, but as adding those features would > >>>> be also > >>>> >> >> >>> interesting for other things I would rather define them as > >>>> separate > >>>> >> >> >>> issues. > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >> 2 New Features connected with the original jira it is then. > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> >>> If you would prefer to work in an own branch please tell me. > >>>> This > >>>> >> >> >>> could have the advantage that patches would not be affected > by > >>>> >> changes > >>>> >> >> >>> in the trunk. > >>>> >> >> >>> > >>>> >> >> >>> Yes, a separate branch sounds good. > >>>> >> >> >> > >>>> >> >> >> best > >>>> >> >> >>> Rupert > >>>> >> >> >>> > >>>> >> >> >>> >> Regards, > >>>> >> >> >>> >> Cristian > >>>> >> >> >>> >> > >>>> >> >> >>> >> > >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler < > >>>> rupert.westentha...@gmail.com> > >>>> >> >> >>> >> > >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca > >>>> >> >> >>> >>> <cristian.petro...@gmail.com> wrote: > >>>> >> >> >>> >>> > Hi Rupert, > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> > Agreed on the > >>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation > >>>> >> >> >>> >>> > data structure. > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to > >>>> >> encapsulate > >>>> >> >> this > >>>> >> >> >>> >>> > information and establish the goals and these initial > >>>> steps > >>>> >> >> towards > >>>> >> >> >>> >>> these > >>>> >> >> >>> >>> > goals? > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great. > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > How should I proceed further? Should I create some > design > >>>> >> >> documents > >>>> >> >> >>> that > >>>> >> >> >>> >>> > need to be reviewed? > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> Usually it is the best to write design related text > >>>> directly in > >>>> >> >> JIRA > >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later > to > >>>> use > >>>> >> this > >>>> >> >> >>> >>> text directly for the documentation on the Stanbol > Webpage. > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> best > >>>> >> >> >>> >>> Rupert > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/ > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> > Regards, > >>>> >> >> >>> >>> > Cristian > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler < > >>>> rupert.westentha...@gmail.com> > >>>> >> >> >>> >>> > > >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca > >>>> >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote: > >>>> >> >> >>> >>> >> > HI Rupert, > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions. > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler < > >>>> >> rupert.westentha...@gmail.com> > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> >> Hi Cristian, all > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> really interesting use case! > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions > on > >>>> how > >>>> >> this > >>>> >> >> >>> could > >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on > >>>> experiences > >>>> >> >> and > >>>> >> >> >>> >>> lessons > >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an > >>>> >> information > >>>> >> >> >>> system > >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this > Project > >>>> >> excluded > >>>> >> >> the > >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text > (because > >>>> the > >>>> >> >> Olympic > >>>> >> >> >>> >>> >> >> Information System was already providing event > data > >>>> as XML > >>>> >> >> >>> messages) > >>>> >> >> >>> >>> >> >> the semantic search capabilities of this system > >>>> where very > >>>> >> >> >>> similar > >>>> >> >> >>> >>> as > >>>> >> >> >>> >>> >> >> the one described by your use case. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations, > >>>> but a > >>>> >> >> formal > >>>> >> >> >>> >>> >> >> representation of the situation described by the > >>>> text. So > >>>> >> >> lets > >>>> >> >> >>> >>> assume > >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or > Situation) > >>>> >> >> described > >>>> >> >> >>> in > >>>> >> >> >>> >>> the > >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some > >>>> advices on > >>>> >> >> how to > >>>> >> >> >>> >>> model > >>>> >> >> >>> >>> >> >> those. The important relation for modeling this > >>>> >> >> Participation: > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> where .. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> * ED are Endurants (continuants): Endurants do > have > >>>> an > >>>> >> >> >>> identity so > >>>> >> >> >>> >>> we > >>>> >> >> >>> >>> >> >> would typically refer to them as Entities > referenced > >>>> by a > >>>> >> >> >>> setting. > >>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as > >>>> well as > >>>> >> >> >>> >>> >> >> social-objects. > >>>> >> >> >>> >>> >> >> * PD are Perdurants (occurrents): Perdurants are > >>>> >> entities > >>>> >> >> that > >>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities > ... > >>>> >> >> >>> >>> >> >> * PC are Participation: It is an time indexed > >>>> relation > >>>> >> where > >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some > >>>> intermediate > >>>> >> >> >>> resources > >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> * fise:SettingAnnotation: It is really handy to > >>>> define > >>>> >> one > >>>> >> >> >>> resource > >>>> >> >> >>> >>> >> >> being the context for all described data. I would > >>>> call > >>>> >> this > >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a > >>>> sub-concept to > >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about > the > >>>> >> extracted > >>>> >> >> >>> >>> Setting > >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> * fise:ParticipantAnnotation: Is used to annotate > >>>> that > >>>> >> >> >>> Endurant is > >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting > >>>> >> >> >>> fise:SettingAnnotation). > >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing > >>>> >> >> fise:TextAnnotaion > >>>> >> >> >>> (the > >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested > >>>> Entities). > >>>> >> >> >>> Basically > >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an > >>>> >> >> EnhancementEngine > >>>> >> >> >>> to > >>>> >> >> >>> >>> >> >> state that several mentions (in possible different > >>>> >> >> sentences) do > >>>> >> >> >>> >>> >> >> represent the same Endurant as participating in > the > >>>> >> Setting. > >>>> >> >> In > >>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type > >>>> property > >>>> >> >> >>> (similar > >>>> >> >> >>> >>> as > >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) > of > >>>> an > >>>> >> >> >>> participant > >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an > >>>> action) > >>>> >> Cause > >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a > >>>> passive > >>>> >> role > >>>> >> >> in > >>>> >> >> >>> an > >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but > I am > >>>> >> >> wondering > >>>> >> >> >>> if > >>>> >> >> >>> >>> one > >>>> >> >> >>> >>> >> >> could extract those information. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a > >>>> >> Perdurant > >>>> >> >> in > >>>> >> >> >>> the > >>>> >> >> >>> >>> >> >> context of the Setting. Also > >>>> fise:OccurrentAnnotation can > >>>> >> >> link > >>>> >> >> >>> to > >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text > >>>> defining > >>>> >> the > >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation > >>>> suggesting > >>>> >> well > >>>> >> >> >>> known > >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a > >>>> country, > >>>> >> or > >>>> >> >> an > >>>> >> >> >>> >>> >> >> upraising ...). In addition > fise:OccurrentAnnotation > >>>> can > >>>> >> >> define > >>>> >> >> >>> >>> >> >> dc:has-participant links to > >>>> fise:ParticipantAnnotation. In > >>>> >> >> this > >>>> >> >> >>> case > >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the > >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this > >>>> Perturant > >>>> >> (the > >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are > >>>> temporal > >>>> >> >> indexed > >>>> >> >> >>> this > >>>> >> >> >>> >>> >> >> annotation should also support properties for > >>>> defining the > >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot > of > >>>> sense > >>>> >> >> with > >>>> >> >> >>> the > >>>> >> >> >>> >>> >> remark > >>>> >> >> >>> >>> >> > that you probably won't be able to always extract > the > >>>> date > >>>> >> >> for a > >>>> >> >> >>> >>> given > >>>> >> >> >>> >>> >> > setting(situation). > >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though. > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which > the > >>>> >> object > >>>> >> >> upon > >>>> >> >> >>> >>> which > >>>> >> >> >>> >>> >> the > >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a > transitory > >>>> >> object ( > >>>> >> >> >>> such > >>>> >> >> >>> >>> as an > >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For > >>>> example > >>>> >> we > >>>> >> >> can > >>>> >> >> >>> >>> have > >>>> >> >> >>> >>> >> the > >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the > Endurant > >>>> ( > >>>> >> >> Subject ) > >>>> >> >> >>> >>> which > >>>> >> >> >>> >>> >> > performs the action of "invading" on another > >>>> Eundurant, > >>>> >> namely > >>>> >> >> >>> >>> "Irak". > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the > >>>> Patient. > >>>> >> Both > >>>> >> >> >>> are > >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the > >>>> Perdurant. So > >>>> >> >> >>> ideally > >>>> >> >> >>> >>> >> you would have a "fise:SettingAnnotation" with: > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation for USA with the > dc:type > >>>> >> >> caos:Agent, > >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a > >>>> >> >> >>> fise:EntityAnnotation > >>>> >> >> >>> >>> >> linking to dbpedia:United_States > >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation for Iraq with the > dc:type > >>>> >> >> >>> caos:Patient, > >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a > >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to dbpedia:Iraq > >>>> >> >> >>> >>> >> * fise:OccurrentAnnotation for "invades" with the > >>>> dc:type > >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for > >>>> "invades" > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and > >>>> the > >>>> >> Object > >>>> >> >> >>> come > >>>> >> >> >>> >>> into > >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a > >>>> >> dc:"property" > >>>> >> >> >>> where > >>>> >> >> >>> >>> the > >>>> >> >> >>> >>> >> > property = verb which links to the Object in noun > >>>> form. For > >>>> >> >> >>> example > >>>> >> >> >>> >>> take > >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would > have > >>>> the > >>>> >> >> "USA" > >>>> >> >> >>> >>> Entity > >>>> >> >> >>> >>> >> with > >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The > >>>> Endurant > >>>> >> >> would > >>>> >> >> >>> >>> have as > >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs > which > >>>> link > >>>> >> it > >>>> >> >> to > >>>> >> >> >>> an > >>>> >> >> >>> >>> >> Object. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> As explained above you would have a > >>>> fise:OccurrentAnnotation > >>>> >> >> that > >>>> >> >> >>> >>> >> represents the Perdurant. The information that the > >>>> activity > >>>> >> >> >>> mention in > >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a > >>>> >> >> >>> fise:TextAnnotation. If > >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that > defines > >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could > >>>> also link > >>>> >> >> to an > >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> best > >>>> >> >> >>> >>> >> Rupert > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > ### Consuming the data: > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> I think this model should be sufficient for > >>>> use-cases as > >>>> >> >> >>> described > >>>> >> >> >>> >>> by > >>>> >> >> >>> >>> >> you. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting > >>>> level. > >>>> >> >> This > >>>> >> >> >>> can > >>>> >> >> >>> >>> be > >>>> >> >> >>> >>> >> >> done my simple retrieving all > >>>> fise:ParticipantAnnotation > >>>> >> as > >>>> >> >> >>> well as > >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. > BTW > >>>> this > >>>> >> was > >>>> >> >> the > >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It > >>>> allows > >>>> >> >> >>> queries for > >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you > >>>> could > >>>> >> filter > >>>> >> >> >>> for > >>>> >> >> >>> >>> >> >> Settings that involve a {Person}, > >>>> activities:Arrested and > >>>> >> a > >>>> >> >> >>> specific > >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach > >>>> you will > >>>> >> >> get > >>>> >> >> >>> >>> results > >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an > >>>> other > >>>> >> >> person > >>>> >> >> >>> was > >>>> >> >> >>> >>> >> >> arrested. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> An other possibility would be to process > enhancement > >>>> >> results > >>>> >> >> on > >>>> >> >> >>> the > >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a > much > >>>> >> higher > >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to > correctly > >>>> answer > >>>> >> >> the > >>>> >> >> >>> query > >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if > the > >>>> >> quality > >>>> >> >> of > >>>> >> >> >>> the > >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I > >>>> have > >>>> >> also > >>>> >> >> >>> doubts > >>>> >> >> >>> >>> if > >>>> >> >> >>> >>> >> >> this can be still realized by using semantic > >>>> indexing to > >>>> >> >> Apache > >>>> >> >> >>> Solr > >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store > results > >>>> in a > >>>> >> >> >>> TripleStore > >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO > [3] > >>>> is > >>>> >> also > >>>> >> >> very > >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 > SPOTL(X) > >>>> >> >> >>> >>> Representation). > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of > Entities > >>>> >> >> (especially > >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings > >>>> extracted > >>>> >> form > >>>> >> >> >>> >>> Documents. > >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are > >>>> temporal > >>>> >> >> indexed. > >>>> >> >> >>> That > >>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge > >>>> base they > >>>> >> >> might > >>>> >> >> >>> >>> still > >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and > >>>> refinement > >>>> >> of > >>>> >> >> such > >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be > >>>> critical for > >>>> >> a > >>>> >> >> >>> System > >>>> >> >> >>> >>> >> >> like described in your use-case. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca > >>>> >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote: > >>>> >> >> >>> >>> >> >> > > >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in > the > >>>> >> field > >>>> >> >> of > >>>> >> >> >>> >>> semantic > >>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in > >>>> the > >>>> >> last > >>>> >> >> 4-5 > >>>> >> >> >>> >>> >> >> months.Having > >>>> >> >> >>> >>> >> >> > said that I have a high level overview of what > is > >>>> a good > >>>> >> >> >>> approach > >>>> >> >> >>> >>> to > >>>> >> >> >>> >>> >> >> solve > >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on > the > >>>> >> internet > >>>> >> >> >>> which > >>>> >> >> >>> >>> >> describe > >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named > entity > >>>> >> >> >>> recognition, > >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only > >>>> supports > >>>> >> >> >>> sentence > >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, > NER > >>>> and > >>>> >> >> lemma. > >>>> >> >> >>> >>> support > >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees > is > >>>> >> currently > >>>> >> >> >>> >>> missing. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol > [4]. > >>>> At > >>>> >> the > >>>> >> >> >>> moment > >>>> >> >> >>> >>> it > >>>> >> >> >>> >>> >> >> only supports English, but I do already work to > >>>> include > >>>> >> the > >>>> >> >> >>> other > >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is > >>>> already > >>>> >> >> >>> integrated > >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. > But > >>>> note > >>>> >> >> that > >>>> >> >> >>> for > >>>> >> >> >>> >>> all > >>>> >> >> >>> >>> >> >> those the integration excludes support for > >>>> co-reference > >>>> >> and > >>>> >> >> >>> >>> dependency > >>>> >> >> >>> >>> >> >> trees. > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a > first > >>>> >> >> prototype > >>>> >> >> >>> by > >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if > available > >>>> - > >>>> >> Chunks > >>>> >> >> >>> (e.g. > >>>> >> >> >>> >>> >> >> Noun phrases). > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature > like > >>>> >> Relation > >>>> >> >> >>> >>> extraction > >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine? > >>>> >> >> >>> >>> >> > What kind of effort would be required for a > >>>> co-reference > >>>> >> >> >>> resolution > >>>> >> >> >>> >>> tool > >>>> >> >> >>> >>> >> > integration into Stanbol? > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But > >>>> before > >>>> >> we > >>>> >> >> can > >>>> >> >> >>> >>> >> build such an engine we would need to > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with > >>>> Annotations for > >>>> >> >> >>> >>> co-reference > >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for > those > >>>> >> >> annotation > >>>> >> >> >>> so > >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide > >>>> >> co-reference > >>>> >> >> >>> >>> >> information > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects: > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate > >>>> the > >>>> >> >> extracted > >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper > structure to > >>>> >> >> represent > >>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully > >>>> extract > >>>> >> >> such > >>>> >> >> >>> >>> >> information form processed texts. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> I would start with > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * fise:SettingAnnotation > >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation > >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >>>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >>>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >>>> >> >> >>> >>> >> * fise:suggestion {entityAnnotation} (multiple if > >>>> there > >>>> >> are > >>>> >> >> >>> more > >>>> >> >> >>> >>> >> suggestions) > >>>> >> >> >>> >>> >> * dc:type one of fise:Agent, fise:Patient, > >>>> >> fise:Instrument, > >>>> >> >> >>> >>> fise:Cause > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * fise:OccurrentAnnotation > >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >>>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >>>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >>>> >> >> >>> >>> >> * dc:type set to fise:Activity > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add > >>>> more > >>>> >> >> >>> structure to > >>>> >> >> >>> >>> >> those annotations. We might also think about using an > >>>> own > >>>> >> >> namespace > >>>> >> >> >>> >>> >> for those extensions to the annotation structure. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated > into > >>>> >> >> Stanbol. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a > >>>> >> enhancement > >>>> >> >> >>> chain > >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> You should have a look at > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot > of > >>>> things > >>>> >> >> with > >>>> >> >> >>> NLP > >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via > >>>> verbs) to > >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit > >>>> dependency > >>>> >> >> trees > >>>> >> >> >>> >>> >> you code will need to do similar things with Nouns, > >>>> Pronouns > >>>> >> and > >>>> >> >> >>> >>> >> Verbs. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java > >>>> >> >> representation > >>>> >> >> >>> of > >>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation > >>>> [2]. > >>>> >> >> >>> Something > >>>> >> >> >>> >>> >> similar will also be required by the > >>>> EventExtractionEngine > >>>> >> for > >>>> >> >> fast > >>>> >> >> >>> >>> >> access to such annotations while iterating over the > >>>> >> Sentences of > >>>> >> >> >>> the > >>>> >> >> >>> >>> >> text. > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> best > >>>> >> >> >>> >>> >> Rupert > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> [1] > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> > >>>> >> >> >>> > >>>> >> >> > >>>> >> > >>>> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java > >>>> >> >> >>> >>> >> [2] > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> > >>>> >> >> >>> > >>>> >> >> > >>>> >> > >>>> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > Thanks > >>>> >> >> >>> >>> >> > > >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion > >>>> >> >> >>> >>> >> >> best > >>>> >> >> >>> >>> >> >> Rupert > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> >> -- > >>>> >> >> >>> >>> >> >> | Rupert Westenthaler > >>>> >> >> rupert.westentha...@gmail.com > >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11 > >>>> >> >> >>> ++43-699-11108907 > >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen > >>>> >> >> >>> >>> >> >> > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> >> -- > >>>> >> >> >>> >>> >> | Rupert Westenthaler > >>>> >> rupert.westentha...@gmail.com > >>>> >> >> >>> >>> >> | Bodenlehenstraße 11 > >>>> >> >> >>> ++43-699-11108907 > >>>> >> >> >>> >>> >> | A-5500 Bischofshofen > >>>> >> >> >>> >>> >> > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> > >>>> >> >> >>> >>> -- > >>>> >> >> >>> >>> | Rupert Westenthaler > >>>> rupert.westentha...@gmail.com > >>>> >> >> >>> >>> | Bodenlehenstraße 11 > >>>> >> >> ++43-699-11108907 > >>>> >> >> >>> >>> | A-5500 Bischofshofen > >>>> >> >> >>> >>> > >>>> >> >> >>> >> > >>>> >> >> >>> >> > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >>> > >>>> >> >> >>> -- > >>>> >> >> >>> | Rupert Westenthaler > >>>> rupert.westentha...@gmail.com > >>>> >> >> >>> | Bodenlehenstraße 11 > >>>> ++43-699-11108907 > >>>> >> >> >>> | A-5500 Bischofshofen > >>>> >> >> >>> > >>>> >> >> >> > >>>> >> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> -- > >>>> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>>> >> >> | Bodenlehenstraße 11 > >>>> ++43-699-11108907 > >>>> >> >> | A-5500 Bischofshofen > >>>> >> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>>> >> | Bodenlehenstraße 11 ++43-699-11108907 > >>>> >> | A-5500 Bischofshofen > >>>> >> > >>>> > >>>> > >>>> > >>>> -- > >>>> | Rupert Westenthaler rupert.westentha...@gmail.com > >>>> | Bodenlehenstraße 11 ++43-699-11108907 > >>>> | A-5500 Bischofshofen > >>>> > >>> > >>> > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >