Ok. This means that I'll need to do a little refactoring on the ValueTypeParser.parser() method to include a reference to the AnalyzedText object coming from AnalyzedTextParser.
2013/9/16 Rupert Westenthaler <rupert.westentha...@gmail.com> > Hi Cristian > > If you have start/end and type of the referenced Span you can use the > according > > AnalysedText#add** > > e.g. > > AnalysedText#addToken(start, end) > AnalysedText#addChunk(start, end) > > method and just use the returned instance. Those methods do all the > magic. Meaning if the referenced Span does not yet exist (forward > reference) it will create a new instance. If the Span already exists > (backward reference) you will get the existing instance including all > the other annotations already parsed from the JSON. In case of a > forward reference the Span created by you (for forward references) > other annotations will be added by the same way. > > This behavior is also the reason why the constructors of the TokenImpl > and ChunkImpl (and all other **Impl) are not public. > > A similar code can be found in the > > AnalyzedTextParser#parseSpan(AnalysedText at, JsonNode node) > > method (o.a.s.enhancer.nlp.json module) > > > So if you have a reference to a Span in your Java API: > > (1) parse the start/end/type of the reference > (2) call add**(start, end) on the AnalysedText > (3) add the returned Span to your set with references > > If you want your references to be sorted you should use NavigableSet > instead of Set. > > best > Rupert > > On Sun, Sep 15, 2013 at 2:32 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > I've already started to implement the Coreference bit first in the nlp > and > > nlp-json projects. There's one thing that I don't know how to implement. > > The CorefTag class contains a Set<Span> mentions member (represents the > > "mentions" array defined in an earlier mail) and in the > > CorefTagSupport.parse() method I need to reconstuct the CorefTag object > > from json. I can't figure out how can I construct the aforementioned > member > > which should contain the references to mentions whch are Span objects > found > > in the AnalyzedTextImpl. One problem is I don't have access to the > > AnalyzedTextImpl object and even if I did there could be situations in > > which I am constructing a CorefTag for a Span which contains mentions to > > other Spans which have not been parsed yet and they don't exist in the > > AnalyzedTextImpl. > > > > One solution would be not to link with the actual Span references from > the > > AnalyzedTextImpl but to create new Span Objects (ChunkImpl, TokenImpl). > > That would need the ChunkImpl and TokenImpl constructors to be changed > from > > protected to public. > > > > > > 2013/9/12 Rupert Westenthaler <rupert.westentha...@gmail.com> > > > >> Hi Cristian, > >> > >> In fact I missed it. Sorry for that. > >> > >> I think the revised proposal looks like a good start. Usually one > >> needs make some adaptions when writing the actual code. > >> > >> If you have a first version attach it to an issue and I will commit it > >> to the branch. > >> > >> best > >> Rupert > >> > >> > >> On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca > >> <cristian.petro...@gmail.com> wrote: > >> > Hi Rupert, > >> > > >> > This is a reminder in case you missed this e-mail. > >> > > >> > Cristian > >> > > >> > > >> > 2013/9/3 Cristian Petroaca <cristian.petro...@gmail.com> > >> > > >> >> Ok, then to sum it up we would have : > >> >> > >> >> 1. Coref > >> >> > >> >> "stanbol.enhancer.nlp.coref" { > >> >> "isRepresentative" : true/false, // whether this token or chunk > is > >> the > >> >> representative mention in the chain > >> >> "mentions" : [ { "type" : "Token", // type of element which > refers > >> to > >> >> this token/chunk > >> >> "start": 123 , // start index of the mentioning element > >> >> "end": 130 // end index of the mentioning element > >> >> }, ... > >> >> ], > >> >> "class" : ""class" : > >> "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > >> >> } > >> >> > >> >> > >> >> 2. Dependency tree > >> >> > >> >> "stanbol.enhancer.nlp.dependency" : { > >> >> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP > >> >> notation > >> >> "dep" : 12, // type of relation - Stanbol NLP > >> >> mapped value - ordinal number in enum Dependency > >> >> "role" : "gov/dep", // whether this token is the depender or the > >> dependee > >> >> "type" : "Token", // type of element with which this token is in > >> relation > >> >> "start" : 123, // start index of the relating token > >> >> "end" : 130 // end index of the relating token > >> >> }, > >> >> ... > >> >> ] > >> >> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" > >> >> } > >> >> > >> >> > >> >> 2013/9/2 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> >> > >> >>> Hi Cristian, > >> >>> > >> >>> let me provide some feedback to your proposals: > >> >>> > >> >>> ### Referring other Spans > >> >>> > >> >>> Both suggested annotations require to link other spans (Sentence, > >> >>> Chunk or Token). For that we should introduce a JSON element used > for > >> >>> referring those elements and use it for all usages. > >> >>> > >> >>> In the java model this would allow you to have a reference to the > >> >>> other Span (Sentence, Chunk, Token). In the serialized form you > would > >> >>> have JSON elements with the "type", "start" and "end" attributes as > >> >>> those three uniquely identify any span. > >> >>> > >> >>> Here an example based on the "mention" attribute as defined by the > >> >>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > >> >>> > >> >>> ... > >> >>> "mentions" : [ { > >> >>> "type" : "Token", > >> >>> "start": 123 , > >> >>> "end": 130 } ,{ > >> >>> "type" : "Token", > >> >>> "start": 157 , > >> >>> "end": 165 }], > >> >>> ... > >> >>> > >> >>> Similar token links in > >> >>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should > also > >> >>> use this model. > >> >>> > >> >>> ### Usage of Controlled Vocabularies > >> >>> > >> >>> In addition the DependencyTag also seams to use a controlled > >> >>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol > >> >>> NLP module tries to define those in some kind of Ontology. For POS > >> >>> tags we use OLIA ontology [1]. This is important as most NLP > >> >>> frameworks will use different strings and we need to unify those to > >> >>> commons IDs so that component that consume those data do not depend > on > >> >>> a specific NLP tool. > >> >>> > >> >>> Because the usage of Ontologies within Java is not well supported. > The > >> >>> Stanbol NLP module defines Java Enumerations for those Ontologies > such > >> >>> as the POS type enumeration [2]. > >> >>> > >> >>> Both the Java Model as well as the JSON serialization do support > both > >> >>> (1) the lexical tag as used by the NLP tool and (2) the mapped > >> >>> concept. In the Java API via two different methods and in the JSON > >> >>> serialization via two separate keys. > >> >>> > >> >>> To make this more clear here an example for a POS annotation of a > >> proper > >> >>> noun. > >> >>> > >> >>> "stanbol.enhancer.nlp.pos" : { > >> >>> "tag" : "PN", > >> >>> "pos" : 53, > >> >>> "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag", > >> >>> "prob" : 0.95 > >> >>> } > >> >>> > >> >>> where > >> >>> > >> >>> "tag" : "PN" > >> >>> > >> >>> is the lexical form as used by the NLP tool and > >> >>> > >> >>> "pos" : 53 > >> >>> > >> >>> refers to the ordinal number of the entry "ProperNoun" in the POS > >> >>> enumeration > >> >>> > >> >>> IMO the "type" property of DependencyTag should use a similar > design. > >> >>> > >> >>> best > >> >>> Rupert > >> >>> > >> >>> [1] http://olia.nlp2rdf.org/ > >> >>> [2] > >> >>> > >> > http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java > >> >>> > >> >>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca > >> >>> <cristian.petro...@gmail.com> wrote: > >> >>> > Sorry, pressed sent too soon :). > >> >>> > > >> >>> > Continued : > >> >>> > > >> >>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, > Tom-3), > >> >>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)] > >> >>> > > >> >>> > Given this, we can have for each "Token" an additional dependency > >> >>> > annotation : > >> >>> > > >> >>> > "stanbol.enhancer.nlp.dependency" : { > >> >>> > "tag" : //is it necessary? > >> >>> > "relations" : [ { "type" : "nsubj", //type of relation > >> >>> > "role" : "gov/dep", //whether it is depender or the dependee > >> >>> > "dependencyValue" : "met", // the word with which the token has > a > >> >>> relation > >> >>> > "dependencyIndexInSentence" : "2" //the index of the dependency > in > >> the > >> >>> > current sentence > >> >>> > } > >> >>> > ... > >> >>> > ] > >> >>> > "class" : > >> >>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" > >> >>> > } > >> >>> > > >> >>> > 2013/9/1 Cristian Petroaca <cristian.petro...@gmail.com> > >> >>> > > >> >>> >> Related to the Stanford Dependency Tree Feature, this is the way > the > >> >>> >> output from the tool looks like for this sentence : "Mary and Tom > >> met > >> >>> Danny > >> >>> >> today" : > >> >>> >> > >> >>> >> > >> >>> >> 2013/8/30 Cristian Petroaca <cristian.petro...@gmail.com> > >> >>> >> > >> >>> >>> Hi Rupert, > >> >>> >>> > >> >>> >>> Ok, so after looking at the JSON output from the Stanford NLP > >> Server > >> >>> and > >> >>> >>> the coref module I'm thinking I can represent the coreference > >> >>> information > >> >>> >>> this way: > >> >>> >>> Each "Token" or "Chunk" will contain an additional coref > annotation > >> >>> with > >> >>> >>> the following structure : > >> >>> >>> > >> >>> >>> "stanbol.enhancer.nlp.coref" { > >> >>> >>> "tag" : //does this need to exist? > >> >>> >>> "isRepresentative" : true/false, // whether this token or > >> chunk is > >> >>> >>> the representative mention in the chain > >> >>> >>> "mentions" : [ { "sentenceNo" : 1 //the sentence in which > the > >> >>> mention > >> >>> >>> is found > >> >>> >>> "startWord" : 2 //the first word > making > >> up > >> >>> the > >> >>> >>> mention > >> >>> >>> "endWord" : 3 //the last word making > up > >> the > >> >>> >>> mention > >> >>> >>> }, ... > >> >>> >>> ], > >> >>> >>> "class" : ""class" : > >> >>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > >> >>> >>> } > >> >>> >>> > >> >>> >>> The CorefTag should resemble this model. > >> >>> >>> > >> >>> >>> What do you think? > >> >>> >>> > >> >>> >>> Cristian > >> >>> >>> > >> >>> >>> > >> >>> >>> 2013/8/24 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> >>> >>> > >> >>> >>>> Hi Cristian, > >> >>> >>>> > >> >>> >>>> you can not directly call StanfordNLP components from Stanbol, > but > >> >>> you > >> >>> >>>> have to extend the RESTful service to include the information > you > >> >>> >>>> need. The main reason for that is that the license of > StanfordNLP > >> is > >> >>> >>>> not compatible with the Apache Software License. So Stanbol can > >> not > >> >>> >>>> directly link to the StanfordNLP API. > >> >>> >>>> > >> >>> >>>> You will need to > >> >>> >>>> > >> >>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> > >> class > >> >>> >>>> in the o.a.s.enhancer.nlp module > >> >>> >>>> 2. add JSON parsing and serialization support for this tag to > the > >> >>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an > >> example) > >> >>> >>>> > >> >>> >>>> As (1) would be necessary anyway the only additional thing you > >> need > >> >>> to > >> >>> >>>> develop is (2). After that you can add {yourTag} instance to > the > >> >>> >>>> AnalyzedText in the StanfornNLP integration. The > >> >>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All > >> >>> >>>> engines executed after the RestfulNlpAnalysisEngine will have > >> access > >> >>> >>>> to your annotations. > >> >>> >>>> > >> >>> >>>> If you have a design for {yourTag} - the model you would like > to > >> use > >> >>> >>>> to represent your data - I can help with (1) and (2). > >> >>> >>>> > >> >>> >>>> best > >> >>> >>>> Rupert > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca > >> >>> >>>> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> > Hi Rupert, > >> >>> >>>> > > >> >>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp > >> project I > >> >>> see > >> >>> >>>> that > >> >>> >>>> > the stanford nlp is not implemented as an EnhancementEngine > but > >> >>> rather > >> >>> >>>> it > >> >>> >>>> > is used directly in a Jetty Server instance. How does that > fit > >> >>> into the > >> >>> >>>> > Stanbol stack? For example how can I call the > >> StanfordNlpAnalyzer's > >> >>> >>>> routine > >> >>> >>>> > from my TripleExtractionEnhancementEngine which lives in the > >> >>> Stanbol > >> >>> >>>> stack? > >> >>> >>>> > > >> >>> >>>> > Thanks, > >> >>> >>>> > Cristian > >> >>> >>>> > > >> >>> >>>> > > >> >>> >>>> > 2013/8/12 Rupert Westenthaler <rupert.westentha...@gmail.com > > > >> >>> >>>> > > >> >>> >>>> >> Hi Cristian, > >> >>> >>>> >> > >> >>> >>>> >> Sorry for the late response, but I was offline for the last > two > >> >>> weeks > >> >>> >>>> >> > >> >>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca > >> >>> >>>> >> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> > Hi Rupert, > >> >>> >>>> >> > > >> >>> >>>> >> > After doing some tests it seems that the Stanford NLP > >> >>> coreference > >> >>> >>>> module > >> >>> >>>> >> is > >> >>> >>>> >> > much more accurate than the Open NLP one.So I decided to > >> extend > >> >>> >>>> Stanford > >> >>> >>>> >> > NLP to add coreference there. > >> >>> >>>> >> > >> >>> >>>> >> The Stanford NLP integration is not part of the Stanbol > >> codebase > >> >>> >>>> >> because the licenses are not compatible. > >> >>> >>>> >> > >> >>> >>>> >> You can find the Stanford NLP integration on > >> >>> >>>> >> > >> >>> >>>> >> https://github.com/westei/stanbol-stanfordnlp > >> >>> >>>> >> > >> >>> >>>> >> just create a fork and send pull requests. > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> > Could you add the necessary projects on the branch? And > also > >> >>> remove > >> >>> >>>> the > >> >>> >>>> >> > Open NLP ones? > >> >>> >>>> >> > > >> >>> >>>> >> > >> >>> >>>> >> Currently the branch > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> > >> >>> > >> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >> >>> >>>> >> > >> >>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO > those > >> >>> should > >> >>> >>>> >> be enough for adding coreference support. > >> >>> >>>> >> > >> >>> >>>> >> IMO you will need to > >> >>> >>>> >> > >> >>> >>>> >> * add an model for representing coreference to the nlp > module > >> >>> >>>> >> * add parsing and serializing support to the nlp-json module > >> >>> >>>> >> * add the implementation to your fork of the > >> stanbol-stanfordnlp > >> >>> >>>> project > >> >>> >>>> >> > >> >>> >>>> >> best > >> >>> >>>> >> Rupert > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> > Thanks, > >> >>> >>>> >> > Cristian > >> >>> >>>> >> > > >> >>> >>>> >> > > >> >>> >>>> >> > 2013/7/5 Rupert Westenthaler < > rupert.westentha...@gmail.com> > >> >>> >>>> >> > > >> >>> >>>> >> >> Hi Cristian, > >> >>> >>>> >> >> > >> >>> >>>> >> >> I created the branch at > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> > >> >>> >>>> > >> >>> > >> > http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/ > >> >>> >>>> >> >> > >> >>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. > Let me > >> >>> know > >> >>> >>>> if > >> >>> >>>> >> >> you would like to have more > >> >>> >>>> >> >> > >> >>> >>>> >> >> best > >> >>> >>>> >> >> Rupert > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca > >> >>> >>>> >> >> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> >> > Hi Rupert, > >> >>> >>>> >> >> > > >> >>> >>>> >> >> > I created jiras : > >> >>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and > >> >>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. > The > >> >>> >>>> original one > >> >>> >>>> >> in > >> >>> >>>> >> >> > dependent upon these. > >> >>> >>>> >> >> > Please let me know when I can start using the branch. > >> >>> >>>> >> >> > > >> >>> >>>> >> >> > Thanks, > >> >>> >>>> >> >> > Cristian > >> >>> >>>> >> >> > > >> >>> >>>> >> >> > > >> >>> >>>> >> >> > 2013/6/27 Cristian Petroaca < > cristian.petro...@gmail.com> > >> >>> >>>> >> >> > > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler < > >> >>> rupert.westentha...@gmail.com> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca > >> >>> >>>> >> >> >>> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford > in my > >> >>> >>>> previous > >> >>> >>>> >> >> e-mail. > >> >>> >>>> >> >> >>> By > >> >>> >>>> >> >> >>> > the way, does Open NLP have the ability to build > >> >>> dependency > >> >>> >>>> trees? > >> >>> >>>> >> >> >>> > > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature. > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated > into > >> >>> >>>> Stanbol, > >> >>> >>>> >> I'll > >> >>> >>>> >> >> >> take a look at how I can extend its integration to > >> include > >> >>> the > >> >>> >>>> >> >> dependency > >> >>> >>>> >> >> >> tree feature. > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >> > > >> >>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca < > >> cristian.petro...@gmail.com > >> >>> > > >> >>> >>>> >> >> >>> > > >> >>> >>>> >> >> >>> >> Hi Rupert, > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >> I created jira > >> >>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121. > >> >>> >>>> >> >> >>> >> As you suggested I would start with extending the > >> >>> Stanford > >> >>> >>>> NLP > >> >>> >>>> >> with > >> >>> >>>> >> >> >>> >> co-reference resolution but I think also with > >> dependency > >> >>> >>>> trees > >> >>> >>>> >> >> because > >> >>> >>>> >> >> >>> I > >> >>> >>>> >> >> >>> >> also need to know the Subject of the sentence and > the > >> >>> object > >> >>> >>>> >> that it > >> >>> >>>> >> >> >>> >> affects, right? > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API > in > >> >>> Stanbol > >> >>> >>>> for > >> >>> >>>> >> >> >>> >> co-reference and dependency trees, how do I > proceed > >> with > >> >>> >>>> this? > >> >>> >>>> >> Do I > >> >>> >>>> >> >> >>> create > >> >>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After > >> that > >> >>> can I > >> >>> >>>> >> start > >> >>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when > I'm > >> >>> done > >> >>> >>>> I'll > >> >>> >>>> >> send > >> >>> >>>> >> >> >>> you > >> >>> >>>> >> >> >>> >> guys the patch fo review? > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> I would create two "New Feature" type Issues one for > >> adding > >> >>> >>>> support > >> >>> >>>> >> >> >>> for "dependency trees" and the other for > "co-reference" > >> >>> >>>> support. You > >> >>> >>>> >> >> >>> should also define "depends on" relations between > >> >>> STANBOL-1121 > >> >>> >>>> and > >> >>> >>>> >> >> >>> those two new issues. > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> Sub-task could also work, but as adding those > features > >> >>> would > >> >>> >>>> be also > >> >>> >>>> >> >> >>> interesting for other things I would rather define > them > >> as > >> >>> >>>> separate > >> >>> >>>> >> >> >>> issues. > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >> 2 New Features connected with the original jira it is > >> then. > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >>> If you would prefer to work in an own branch please > tell > >> >>> me. > >> >>> >>>> This > >> >>> >>>> >> >> >>> could have the advantage that patches would not be > >> >>> affected by > >> >>> >>>> >> changes > >> >>> >>>> >> >> >>> in the trunk. > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> Yes, a separate branch sounds good. > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> best > >> >>> >>>> >> >> >>> Rupert > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> >> Regards, > >> >>> >>>> >> >> >>> >> Cristian > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler < > >> >>> >>>> rupert.westentha...@gmail.com> > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian > Petroaca > >> >>> >>>> >> >> >>> >>> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> >> >>> >>> > Hi Rupert, > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> > Agreed on the > >> >>> >>>> >> >> >>> > >> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation > >> >>> >>>> >> >> >>> >>> > data structure. > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in > order > >> to > >> >>> >>>> >> encapsulate > >> >>> >>>> >> >> this > >> >>> >>>> >> >> >>> >>> > information and establish the goals and these > >> initial > >> >>> >>>> steps > >> >>> >>>> >> >> towards > >> >>> >>>> >> >> >>> >>> these > >> >>> >>>> >> >> >>> >>> > goals? > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be > >> great. > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> > How should I proceed further? Should I create > some > >> >>> design > >> >>> >>>> >> >> documents > >> >>> >>>> >> >> >>> that > >> >>> >>>> >> >> >>> >>> > need to be reviewed? > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> Usually it is the best to write design related > text > >> >>> >>>> directly in > >> >>> >>>> >> >> JIRA > >> >>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us > >> later > >> >>> to > >> >>> >>>> use > >> >>> >>>> >> this > >> >>> >>>> >> >> >>> >>> text directly for the documentation on the > Stanbol > >> >>> Webpage. > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> best > >> >>> >>>> >> >> >>> >>> Rupert > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/ > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> > Regards, > >> >>> >>>> >> >> >>> >>> > Cristian > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler < > >> >>> >>>> rupert.westentha...@gmail.com> > >> >>> >>>> >> >> >>> >>> > > >> >>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian > >> Petroaca > >> >>> >>>> >> >> >>> >>> >> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> >> >>> >>> >> > HI Rupert, > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed > >> suggestions. > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler < > >> >>> >>>> >> rupert.westentha...@gmail.com> > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> >> Hi Cristian, all > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> really interesting use case! > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some > >> suggestions > >> >>> on > >> >>> >>>> how > >> >>> >>>> >> this > >> >>> >>>> >> >> >>> could > >> >>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly > based on > >> >>> >>>> experiences > >> >>> >>>> >> >> and > >> >>> >>>> >> >> >>> >>> lessons > >> >>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we > >> built an > >> >>> >>>> >> information > >> >>> >>>> >> >> >>> system > >> >>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this > >> >>> Project > >> >>> >>>> >> excluded > >> >>> >>>> >> >> the > >> >>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text > >> >>> (because > >> >>> >>>> the > >> >>> >>>> >> >> Olympic > >> >>> >>>> >> >> >>> >>> >> >> Information System was already providing > event > >> >>> data > >> >>> >>>> as XML > >> >>> >>>> >> >> >>> messages) > >> >>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this > >> system > >> >>> >>>> where very > >> >>> >>>> >> >> >>> similar > >> >>> >>>> >> >> >>> >>> as > >> >>> >>>> >> >> >>> >>> >> >> the one described by your use case. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract > >> >>> relations, > >> >>> >>>> but a > >> >>> >>>> >> >> formal > >> >>> >>>> >> >> >>> >>> >> >> representation of the situation described > by > >> the > >> >>> >>>> text. So > >> >>> >>>> >> >> lets > >> >>> >>>> >> >> >>> >>> assume > >> >>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or > >> >>> Situation) > >> >>> >>>> >> >> described > >> >>> >>>> >> >> >>> in > >> >>> >>>> >> >> >>> >>> the > >> >>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives > some > >> >>> >>>> advices on > >> >>> >>>> >> >> how to > >> >>> >>>> >> >> >>> >>> model > >> >>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling > >> this > >> >>> >>>> >> >> Participation: > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> where .. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> * ED are Endurants (continuants): > Endurants > >> do > >> >>> have > >> >>> >>>> an > >> >>> >>>> >> >> >>> identity so > >> >>> >>>> >> >> >>> >>> we > >> >>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities > >> >>> referenced > >> >>> >>>> by a > >> >>> >>>> >> >> >>> setting. > >> >>> >>>> >> >> >>> >>> >> >> Note that this includes physical, > >> non-physical as > >> >>> >>>> well as > >> >>> >>>> >> >> >>> >>> >> >> social-objects. > >> >>> >>>> >> >> >>> >>> >> >> * PD are Perdurants (occurrents): > Perdurants > >> >>> are > >> >>> >>>> >> entities > >> >>> >>>> >> >> that > >> >>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events, > >> >>> Activities ... > >> >>> >>>> >> >> >>> >>> >> >> * PC are Participation: It is an time > indexed > >> >>> >>>> relation > >> >>> >>>> >> where > >> >>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define > some > >> >>> >>>> intermediate > >> >>> >>>> >> >> >>> resources > >> >>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary > >> relations. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> * fise:SettingAnnotation: It is really > handy > >> to > >> >>> >>>> define > >> >>> >>>> >> one > >> >>> >>>> >> >> >>> resource > >> >>> >>>> >> >> >>> >>> >> >> being the context for all described data. I > >> would > >> >>> >>>> call > >> >>> >>>> >> this > >> >>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a > >> >>> >>>> sub-concept to > >> >>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement > >> about > >> >>> the > >> >>> >>>> >> extracted > >> >>> >>>> >> >> >>> >>> Setting > >> >>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation > to > >> it. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> * fise:ParticipantAnnotation: Is used to > >> >>> annotate > >> >>> >>>> that > >> >>> >>>> >> >> >>> Endurant is > >> >>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting > >> >>> >>>> >> >> >>> fise:SettingAnnotation). > >> >>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by > existing > >> >>> >>>> >> >> fise:TextAnnotaion > >> >>> >>>> >> >> >>> (the > >> >>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation > (suggested > >> >>> >>>> Entities). > >> >>> >>>> >> >> >>> Basically > >> >>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow > an > >> >>> >>>> >> >> EnhancementEngine > >> >>> >>>> >> >> >>> to > >> >>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible > >> >>> different > >> >>> >>>> >> >> sentences) do > >> >>> >>>> >> >> >>> >>> >> >> represent the same Endurant as > participating > >> in > >> >>> the > >> >>> >>>> >> Setting. > >> >>> >>>> >> >> In > >> >>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the > >> dc:type > >> >>> >>>> property > >> >>> >>>> >> >> >>> (similar > >> >>> >>>> >> >> >>> >>> as > >> >>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the > >> role(s) > >> >>> of > >> >>> >>>> an > >> >>> >>>> >> >> >>> participant > >> >>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally > performs > >> an > >> >>> >>>> action) > >> >>> >>>> >> Cause > >> >>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), > Patient (a > >> >>> >>>> passive > >> >>> >>>> >> role > >> >>> >>>> >> >> in > >> >>> >>>> >> >> >>> an > >> >>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an > process)), > >> but > >> >>> I am > >> >>> >>>> >> >> wondering > >> >>> >>>> >> >> >>> if > >> >>> >>>> >> >> >>> >>> one > >> >>> >>>> >> >> >>> >>> >> >> could extract those information. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to > >> annotate a > >> >>> >>>> >> Perdurant > >> >>> >>>> >> >> in > >> >>> >>>> >> >> >>> the > >> >>> >>>> >> >> >>> >>> >> >> context of the Setting. Also > >> >>> >>>> fise:OccurrentAnnotation can > >> >>> >>>> >> >> link > >> >>> >>>> >> >> >>> to > >> >>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the > >> text > >> >>> >>>> defining > >> >>> >>>> >> the > >> >>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation > >> >>> >>>> suggesting > >> >>> >>>> >> well > >> >>> >>>> >> >> >>> known > >> >>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election > >> in a > >> >>> >>>> country, > >> >>> >>>> >> or > >> >>> >>>> >> >> an > >> >>> >>>> >> >> >>> >>> >> >> upraising ...). In addition > >> >>> fise:OccurrentAnnotation > >> >>> >>>> can > >> >>> >>>> >> >> define > >> >>> >>>> >> >> >>> >>> >> >> dc:has-participant links to > >> >>> >>>> fise:ParticipantAnnotation. In > >> >>> >>>> >> >> this > >> >>> >>>> >> >> >>> case > >> >>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant > (the > >> >>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in > this > >> >>> >>>> Perturant > >> >>> >>>> >> (the > >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences > are > >> >>> >>>> temporal > >> >>> >>>> >> >> indexed > >> >>> >>>> >> >> >>> this > >> >>> >>>> >> >> >>> >>> >> >> annotation should also support properties > for > >> >>> >>>> defining the > >> >>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure > makes a > >> >>> lot of > >> >>> >>>> sense > >> >>> >>>> >> >> with > >> >>> >>>> >> >> >>> the > >> >>> >>>> >> >> >>> >>> >> remark > >> >>> >>>> >> >> >>> >>> >> > that you probably won't be able to always > >> extract > >> >>> the > >> >>> >>>> date > >> >>> >>>> >> >> for a > >> >>> >>>> >> >> >>> >>> given > >> >>> >>>> >> >> >>> >>> >> > setting(situation). > >> >>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though. > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in > >> which > >> >>> the > >> >>> >>>> >> object > >> >>> >>>> >> >> upon > >> >>> >>>> >> >> >>> >>> which > >> >>> >>>> >> >> >>> >>> >> the > >> >>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a > >> >>> transitory > >> >>> >>>> >> object ( > >> >>> >>>> >> >> >>> such > >> >>> >>>> >> >> >>> >>> as an > >> >>> >>>> >> >> >>> >>> >> > event, activity ) but rather another > Endurant. > >> For > >> >>> >>>> example > >> >>> >>>> >> we > >> >>> >>>> >> >> can > >> >>> >>>> >> >> >>> >>> have > >> >>> >>>> >> >> >>> >>> >> the > >> >>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the > >> >>> Endurant > >> >>> >>>> ( > >> >>> >>>> >> >> Subject ) > >> >>> >>>> >> >> >>> >>> which > >> >>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another > >> >>> >>>> Eundurant, > >> >>> >>>> >> namely > >> >>> >>>> >> >> >>> >>> "Irak". > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq > >> the > >> >>> >>>> Patient. > >> >>> >>>> >> Both > >> >>> >>>> >> >> >>> are > >> >>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be > the > >> >>> >>>> Perdurant. So > >> >>> >>>> >> >> >>> ideally > >> >>> >>>> >> >> >>> >>> >> you would have a "fise:SettingAnnotation" > with: > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation for USA with > the > >> >>> dc:type > >> >>> >>>> >> >> caos:Agent, > >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" > and a > >> >>> >>>> >> >> >>> fise:EntityAnnotation > >> >>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States > >> >>> >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation for Iraq with > the > >> >>> dc:type > >> >>> >>>> >> >> >>> caos:Patient, > >> >>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" > and a > >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to dbpedia:Iraq > >> >>> >>>> >> >> >>> >>> >> * fise:OccurrentAnnotation for "invades" > with > >> the > >> >>> >>>> dc:type > >> >>> >>>> >> >> >>> >>> >> caos:Activity, linking to a > fise:TextAnnotation > >> for > >> >>> >>>> "invades" > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the > Subject > >> >>> and > >> >>> >>>> the > >> >>> >>>> >> Object > >> >>> >>>> >> >> >>> come > >> >>> >>>> >> >> >>> >>> into > >> >>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would > have a > >> >>> >>>> >> dc:"property" > >> >>> >>>> >> >> >>> where > >> >>> >>>> >> >> >>> >>> the > >> >>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in > >> noun > >> >>> >>>> form. For > >> >>> >>>> >> >> >>> example > >> >>> >>>> >> >> >>> >>> take > >> >>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You > >> would > >> >>> have > >> >>> >>>> the > >> >>> >>>> >> >> "USA" > >> >>> >>>> >> >> >>> >>> Entity > >> >>> >>>> >> >> >>> >>> >> with > >> >>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object > "Irak". > >> The > >> >>> >>>> Endurant > >> >>> >>>> >> >> would > >> >>> >>>> >> >> >>> >>> have as > >> >>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are > verbs > >> >>> which > >> >>> >>>> link > >> >>> >>>> >> it > >> >>> >>>> >> >> to > >> >>> >>>> >> >> >>> an > >> >>> >>>> >> >> >>> >>> >> Object. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> As explained above you would have a > >> >>> >>>> fise:OccurrentAnnotation > >> >>> >>>> >> >> that > >> >>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that > >> the > >> >>> >>>> activity > >> >>> >>>> >> >> >>> mention in > >> >>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a > >> >>> >>>> >> >> >>> fise:TextAnnotation. If > >> >>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks > that > >> >>> defines > >> >>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation > >> could > >> >>> >>>> also link > >> >>> >>>> >> >> to an > >> >>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> best > >> >>> >>>> >> >> >>> >>> >> Rupert > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > ### Consuming the data: > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for > >> >>> >>>> use-cases as > >> >>> >>>> >> >> >>> described > >> >>> >>>> >> >> >>> >>> by > >> >>> >>>> >> >> >>> >>> >> you. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the > >> >>> setting > >> >>> >>>> level. > >> >>> >>>> >> >> This > >> >>> >>>> >> >> >>> can > >> >>> >>>> >> >> >>> >>> be > >> >>> >>>> >> >> >>> >>> >> >> done my simple retrieving all > >> >>> >>>> fise:ParticipantAnnotation > >> >>> >>>> >> as > >> >>> >>>> >> >> >>> well as > >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a > >> setting. > >> >>> BTW > >> >>> >>>> this > >> >>> >>>> >> was > >> >>> >>>> >> >> the > >> >>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic > >> search. It > >> >>> >>>> allows > >> >>> >>>> >> >> >>> queries for > >> >>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities > e.g. > >> you > >> >>> >>>> could > >> >>> >>>> >> filter > >> >>> >>>> >> >> >>> for > >> >>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person}, > >> >>> >>>> activities:Arrested and > >> >>> >>>> >> a > >> >>> >>>> >> >> >>> specific > >> >>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this > >> approach > >> >>> >>>> you will > >> >>> >>>> >> >> get > >> >>> >>>> >> >> >>> >>> results > >> >>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated > >> and > >> >>> an > >> >>> >>>> other > >> >>> >>>> >> >> person > >> >>> >>>> >> >> >>> was > >> >>> >>>> >> >> >>> >>> >> >> arrested. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> An other possibility would be to process > >> >>> enhancement > >> >>> >>>> >> results > >> >>> >>>> >> >> on > >> >>> >>>> >> >> >>> the > >> >>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow > to > >> a > >> >>> much > >> >>> >>>> >> higher > >> >>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to > >> >>> correctly > >> >>> >>>> answer > >> >>> >>>> >> >> the > >> >>> >>>> >> >> >>> query > >> >>> >>>> >> >> >>> >>> >> >> used as an example above). But I am > wondering > >> if > >> >>> the > >> >>> >>>> >> quality > >> >>> >>>> >> >> of > >> >>> >>>> >> >> >>> the > >> >>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for > >> this. I > >> >>> >>>> have > >> >>> >>>> >> also > >> >>> >>>> >> >> >>> doubts > >> >>> >>>> >> >> >>> >>> if > >> >>> >>>> >> >> >>> >>> >> >> this can be still realized by using > semantic > >> >>> >>>> indexing to > >> >>> >>>> >> >> Apache > >> >>> >>>> >> >> >>> Solr > >> >>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store > >> >>> results > >> >>> >>>> in a > >> >>> >>>> >> >> >>> TripleStore > >> >>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> The methodology and query language used by > >> YAGO > >> >>> [3] > >> >>> >>>> is > >> >>> >>>> >> also > >> >>> >>>> >> >> very > >> >>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter > 7 > >> >>> SPOTL(X) > >> >>> >>>> >> >> >>> >>> Representation). > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of > >> >>> Entities > >> >>> >>>> >> >> (especially > >> >>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on > Settings > >> >>> >>>> extracted > >> >>> >>>> >> form > >> >>> >>>> >> >> >>> >>> Documents. > >> >>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants > are > >> >>> >>>> temporal > >> >>> >>>> >> >> indexed. > >> >>> >>>> >> >> >>> That > >> >>> >>>> >> >> >>> >>> >> >> means that at the time when added to a > >> knowledge > >> >>> >>>> base they > >> >>> >>>> >> >> might > >> >>> >>>> >> >> >>> >>> still > >> >>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching > and > >> >>> >>>> refinement > >> >>> >>>> >> of > >> >>> >>>> >> >> such > >> >>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to > be > >> >>> >>>> critical for > >> >>> >>>> >> a > >> >>> >>>> >> >> >>> System > >> >>> >>>> >> >> >>> >>> >> >> like described in your use-case. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian > >> >>> Petroaca > >> >>> >>>> >> >> >>> >>> >> >> <cristian.petro...@gmail.com> wrote: > >> >>> >>>> >> >> >>> >>> >> >> > > >> >>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am > new > >> >>> in the > >> >>> >>>> >> field > >> >>> >>>> >> >> of > >> >>> >>>> >> >> >>> >>> semantic > >> >>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about > >> them > >> >>> in > >> >>> >>>> the > >> >>> >>>> >> last > >> >>> >>>> >> >> 4-5 > >> >>> >>>> >> >> >>> >>> >> >> months.Having > >> >>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of > >> what > >> >>> is > >> >>> >>>> a good > >> >>> >>>> >> >> >>> approach > >> >>> >>>> >> >> >>> >>> to > >> >>> >>>> >> >> >>> >>> >> >> solve > >> >>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of > papers > >> on > >> >>> the > >> >>> >>>> >> internet > >> >>> >>>> >> >> >>> which > >> >>> >>>> >> >> >>> >>> >> describe > >> >>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : > named > >> >>> entity > >> >>> >>>> >> >> >>> recognition, > >> >>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and > >> >>> others. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently > >> only > >> >>> >>>> supports > >> >>> >>>> >> >> >>> sentence > >> >>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, > >> Chunking, > >> >>> NER > >> >>> >>>> and > >> >>> >>>> >> >> lemma. > >> >>> >>>> >> >> >>> >>> support > >> >>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency > >> trees > >> >>> is > >> >>> >>>> >> currently > >> >>> >>>> >> >> >>> >>> missing. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with > >> Stanbol > >> >>> [4]. > >> >>> >>>> At > >> >>> >>>> >> the > >> >>> >>>> >> >> >>> moment > >> >>> >>>> >> >> >>> >>> it > >> >>> >>>> >> >> >>> >>> >> >> only supports English, but I do already > work > >> to > >> >>> >>>> include > >> >>> >>>> >> the > >> >>> >>>> >> >> >>> other > >> >>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework > that > >> is > >> >>> >>>> already > >> >>> >>>> >> >> >>> integrated > >> >>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane > >> [6]. > >> >>> But > >> >>> >>>> note > >> >>> >>>> >> >> that > >> >>> >>>> >> >> >>> for > >> >>> >>>> >> >> >>> >>> all > >> >>> >>>> >> >> >>> >>> >> >> those the integration excludes support for > >> >>> >>>> co-reference > >> >>> >>>> >> and > >> >>> >>>> >> >> >>> >>> dependency > >> >>> >>>> >> >> >>> >>> >> >> trees. > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can > implement > >> a > >> >>> first > >> >>> >>>> >> >> prototype > >> >>> >>>> >> >> >>> by > >> >>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if > >> >>> available > >> >>> >>>> - > >> >>> >>>> >> Chunks > >> >>> >>>> >> >> >>> (e.g. > >> >>> >>>> >> >> >>> >>> >> >> Noun phrases). > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a > feature > >> >>> like > >> >>> >>>> >> Relation > >> >>> >>>> >> >> >>> >>> extraction > >> >>> >>>> >> >> >>> >>> >> > would be implemented as an > EnhancementEngine? > >> >>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a > >> >>> >>>> co-reference > >> >>> >>>> >> >> >>> resolution > >> >>> >>>> >> >> >>> >>> tool > >> >>> >>>> >> >> >>> >>> >> > integration into Stanbol? > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> Yes in the end it would be an > EnhancementEngine. > >> But > >> >>> >>>> before > >> >>> >>>> >> we > >> >>> >>>> >> >> can > >> >>> >>>> >> >> >>> >>> >> build such an engine we would need to > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with > >> >>> >>>> Annotations for > >> >>> >>>> >> >> >>> >>> co-reference > >> >>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing > for > >> >>> those > >> >>> >>>> >> >> annotation > >> >>> >>>> >> >> >>> so > >> >>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can > provide > >> >>> >>>> >> co-reference > >> >>> >>>> >> >> >>> >>> >> information > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 > aspects: > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to > >> >>> encapsulate > >> >>> >>>> the > >> >>> >>>> >> >> extracted > >> >>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at > Dolce. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper > >> >>> structure to > >> >>> >>>> >> >> represent > >> >>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also > >> successfully > >> >>> >>>> extract > >> >>> >>>> >> >> such > >> >>> >>>> >> >> >>> >>> >> information form processed texts. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> I would start with > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * fise:SettingAnnotation > >> >>> >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * fise:ParticipantAnnotation > >> >>> >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >>> >>>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >> >>> >>>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >> >>> >>>> >> >> >>> >>> >> * fise:suggestion {entityAnnotation} > >> (multiple > >> >>> if > >> >>> >>>> there > >> >>> >>>> >> are > >> >>> >>>> >> >> >>> more > >> >>> >>>> >> >> >>> >>> >> suggestions) > >> >>> >>>> >> >> >>> >>> >> * dc:type one of fise:Agent, fise:Patient, > >> >>> >>>> >> fise:Instrument, > >> >>> >>>> >> >> >>> >>> fise:Cause > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * fise:OccurrentAnnotation > >> >>> >>>> >> >> >>> >>> >> * {fise:Enhancement} metadata > >> >>> >>>> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} > >> >>> >>>> >> >> >>> >>> >> * fise:hasMention {textAnnotation} > >> >>> >>>> >> >> >>> >>> >> * dc:type set to fise:Activity > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we > can > >> add > >> >>> >>>> more > >> >>> >>>> >> >> >>> structure to > >> >>> >>>> >> >> >>> >>> >> those annotations. We might also think about > >> using > >> >>> an > >> >>> >>>> own > >> >>> >>>> >> >> namespace > >> >>> >>>> >> >> >>> >>> >> for those extensions to the annotation > structure. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be > >> integrated > >> >>> into > >> >>> >>>> >> >> Stanbol. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and > >> configure a > >> >>> >>>> >> enhancement > >> >>> >>>> >> >> >>> chain > >> >>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> You should have a look at > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does > a > >> lot > >> >>> of > >> >>> >>>> things > >> >>> >>>> >> >> with > >> >>> >>>> >> >> >>> NLP > >> >>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives > >> (via > >> >>> >>>> verbs) to > >> >>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use > >> explicit > >> >>> >>>> dependency > >> >>> >>>> >> >> trees > >> >>> >>>> >> >> >>> >>> >> you code will need to do similar things with > >> Nouns, > >> >>> >>>> Pronouns > >> >>> >>>> >> and > >> >>> >>>> >> >> >>> >>> >> Verbs. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a > >> Java > >> >>> >>>> >> >> representation > >> >>> >>>> >> >> >>> of > >> >>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and > >> >>> fise:EntityAnnotation > >> >>> >>>> [2]. > >> >>> >>>> >> >> >>> Something > >> >>> >>>> >> >> >>> >>> >> similar will also be required by the > >> >>> >>>> EventExtractionEngine > >> >>> >>>> >> for > >> >>> >>>> >> >> fast > >> >>> >>>> >> >> >>> >>> >> access to such annotations while iterating > over > >> the > >> >>> >>>> >> Sentences of > >> >>> >>>> >> >> >>> the > >> >>> >>>> >> >> >>> >>> >> text. > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> best > >> >>> >>>> >> >> >>> >>> >> Rupert > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> [1] > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> > >> >>> >>>> >> > >> >>> >>>> > >> >>> > >> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java > >> >>> >>>> >> >> >>> >>> >> [2] > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> > >> >>> >>>> >> > >> >>> >>>> > >> >>> > >> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > Thanks > >> >>> >>>> >> >> >>> >>> >> > > >> >>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion > >> >>> >>>> >> >> >>> >>> >> >> best > >> >>> >>>> >> >> >>> >>> >> >> Rupert > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> >> -- > >> >>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler > >> >>> >>>> >> >> rupert.westentha...@gmail.com > >> >>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11 > >> >>> >>>> >> >> >>> ++43-699-11108907 > >> >>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen > >> >>> >>>> >> >> >>> >>> >> >> > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> >> -- > >> >>> >>>> >> >> >>> >>> >> | Rupert Westenthaler > >> >>> >>>> >> rupert.westentha...@gmail.com > >> >>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11 > >> >>> >>>> >> >> >>> ++43-699-11108907 > >> >>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen > >> >>> >>>> >> >> >>> >>> >> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >>> -- > >> >>> >>>> >> >> >>> >>> | Rupert Westenthaler > >> >>> >>>> rupert.westentha...@gmail.com > >> >>> >>>> >> >> >>> >>> | Bodenlehenstraße 11 > >> >>> >>>> >> >> ++43-699-11108907 > >> >>> >>>> >> >> >>> >>> | A-5500 Bischofshofen > >> >>> >>>> >> >> >>> >>> > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> >> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >>> -- > >> >>> >>>> >> >> >>> | Rupert Westenthaler > >> >>> >>>> rupert.westentha...@gmail.com > >> >>> >>>> >> >> >>> | Bodenlehenstraße 11 > >> >>> >>>> ++43-699-11108907 > >> >>> >>>> >> >> >>> | A-5500 Bischofshofen > >> >>> >>>> >> >> >>> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> > >> >>> >>>> >> >> -- > >> >>> >>>> >> >> | Rupert Westenthaler > >> >>> rupert.westentha...@gmail.com > >> >>> >>>> >> >> | Bodenlehenstraße 11 > >> >>> >>>> ++43-699-11108907 > >> >>> >>>> >> >> | A-5500 Bischofshofen > >> >>> >>>> >> >> > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> > >> >>> >>>> >> -- > >> >>> >>>> >> | Rupert Westenthaler > >> rupert.westentha...@gmail.com > >> >>> >>>> >> | Bodenlehenstraße 11 > >> >>> ++43-699-11108907 > >> >>> >>>> >> | A-5500 Bischofshofen > >> >>> >>>> >> > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> -- > >> >>> >>>> | Rupert Westenthaler > rupert.westentha...@gmail.com > >> >>> >>>> | Bodenlehenstraße 11 > >> ++43-699-11108907 > >> >>> >>>> | A-5500 Bischofshofen > >> >>> >>>> > >> >>> >>> > >> >>> >>> > >> >>> >> > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> | Rupert Westenthaler rupert.westentha...@gmail.com > >> >>> | Bodenlehenstraße 11 ++43-699-11108907 > >> >>> | A-5500 Bischofshofen > >> >>> > >> >> > >> >> > >> > >> > >> > >> -- > >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >