2013/6/27 Rupert Westenthaler <rupert.westentha...@gmail.com> > On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > Sorry, I meant the Stanbol NLP API, not Stanford in my previous e-mail. > By > > the way, does Open NLP have the ability to build dependency trees? > > > > AFAIK OpenNLP does not provide this feature. >
Then , since the Stanford NLP lib is also integrated into Stanbol, I'll take a look at how I can extend its integration to include the dependency tree feature. > > > > > 2013/6/23 Cristian Petroaca <cristian.petro...@gmail.com> > > > >> Hi Rupert, > >> > >> I created jira https://issues.apache.org/jira/browse/STANBOL-1121. > >> As you suggested I would start with extending the Stanford NLP with > >> co-reference resolution but I think also with dependency trees because I > >> also need to know the Subject of the sentence and the object that it > >> affects, right? > >> > >> Given that I need to extend the Stanford NLP API in Stanbol for > >> co-reference and dependency trees, how do I proceed with this? Do I > create > >> 2 new sub-tasks to the already opened Jira? After that can I start > >> implementing on my local copy of Stanbol and when I'm done I'll send you > >> guys the patch fo review? > >> > > I would create two "New Feature" type Issues one for adding support > for "dependency trees" and the other for "co-reference" support. You > should also define "depends on" relations between STANBOL-1121 and > those two new issues. > > Sub-task could also work, but as adding those features would be also > interesting for other things I would rather define them as separate > issues. > > 2 New Features connected with the original jira it is then. > If you would prefer to work in an own branch please tell me. This > could have the advantage that patches would not be affected by changes > in the trunk. > > Yes, a separate branch sounds good. best > Rupert > > >> Regards, > >> Cristian > >> > >> > >> 2013/6/18 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> > >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca > >>> <cristian.petro...@gmail.com> wrote: > >>> > Hi Rupert, > >>> > > >>> > Agreed on the > SettingAnnotation/ParticipantAnnotation/OccurentAnnotation > >>> > data structure. > >>> > > >>> > Should I open up a Jira for all of this in order to encapsulate this > >>> > information and establish the goals and these initial steps towards > >>> these > >>> > goals? > >>> > >>> Yes please. A JIRA issue for this work would be great. > >>> > >>> > How should I proceed further? Should I create some design documents > that > >>> > need to be reviewed? > >>> > >>> Usually it is the best to write design related text directly in JIRA > >>> by using Markdown [1] syntax. This will allow us later to use this > >>> text directly for the documentation on the Stanbol Webpage. > >>> > >>> best > >>> Rupert > >>> > >>> > >>> [1] http://daringfireball.net/projects/markdown/ > >>> > > >>> > Regards, > >>> > Cristian > >>> > > >>> > > >>> > 2013/6/17 Rupert Westenthaler <rupert.westentha...@gmail.com> > >>> > > >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca > >>> >> <cristian.petro...@gmail.com> wrote: > >>> >> > HI Rupert, > >>> >> > > >>> >> > First of all thanks for the detailed suggestions. > >>> >> > > >>> >> > 2013/6/12 Rupert Westenthaler <rupert.westentha...@gmail.com> > >>> >> > > >>> >> >> Hi Cristian, all > >>> >> >> > >>> >> >> really interesting use case! > >>> >> >> > >>> >> >> In this mail I will try to give some suggestions on how this > could > >>> >> >> work out. This suggestions are mainly based on experiences and > >>> lessons > >>> >> >> learned in the LIVE [2] project where we built an information > system > >>> >> >> for the Olympic Games in Peking. While this Project excluded the > >>> >> >> extraction of Events from unstructured text (because the Olympic > >>> >> >> Information System was already providing event data as XML > messages) > >>> >> >> the semantic search capabilities of this system where very > similar > >>> as > >>> >> >> the one described by your use case. > >>> >> >> > >>> >> >> IMHO you are not only trying to extract relations, but a formal > >>> >> >> representation of the situation described by the text. So lets > >>> assume > >>> >> >> that the goal is to Annotate a Setting (or Situation) described > in > >>> the > >>> >> >> text - a fise:SettingAnnotation. > >>> >> >> > >>> >> >> The DOLCE foundational ontology [1] gives some advices on how to > >>> model > >>> >> >> those. The important relation for modeling this Participation: > >>> >> >> > >>> >> >> PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) > >>> >> >> > >>> >> >> where .. > >>> >> >> > >>> >> >> * ED are Endurants (continuants): Endurants do have an identity > so > >>> we > >>> >> >> would typically refer to them as Entities referenced by a > setting. > >>> >> >> Note that this includes physical, non-physical as well as > >>> >> >> social-objects. > >>> >> >> * PD are Perdurants (occurrents): Perdurants are entities that > >>> >> >> happen in time. This refers to Events, Activities ... > >>> >> >> * PC are Participation: It is an time indexed relation where > >>> >> >> Endurants participate in Perdurants > >>> >> >> > >>> >> >> Modeling this in RDF requires to define some intermediate > resources > >>> >> >> because RDF does not allow for n-ary relations. > >>> >> >> > >>> >> >> * fise:SettingAnnotation: It is really handy to define one > resource > >>> >> >> being the context for all described data. I would call this > >>> >> >> "fise:SettingAnnotation" and define it as a sub-concept to > >>> >> >> fise:Enhancement. All further enhancement about the extracted > >>> Setting > >>> >> >> would define a "fise:in-setting" relation to it. > >>> >> >> > >>> >> >> * fise:ParticipantAnnotation: Is used to annotate that Endurant > is > >>> >> >> participating on a setting (fise:in-setting > fise:SettingAnnotation). > >>> >> >> The Endurant itself is described by existing fise:TextAnnotaion > (the > >>> >> >> mentions) and fise:EntityAnnotation (suggested Entities). > Basically > >>> >> >> the fise:ParticipantAnnotation will allow an EnhancementEngine to > >>> >> >> state that several mentions (in possible different sentences) do > >>> >> >> represent the same Endurant as participating in the Setting. In > >>> >> >> addition it would be possible to use the dc:type property > (similar > >>> as > >>> >> >> for fise:TextAnnotation) to refer to the role(s) of an > participant > >>> >> >> (e.g. the set: Agent (intensionally performs an action) Cause > >>> >> >> (unintentionally e.g. a mud slide), Patient (a passive role in an > >>> >> >> activity) and Instrument (aids an process)), but I am wondering > if > >>> one > >>> >> >> could extract those information. > >>> >> >> > >>> >> >> * fise:OccurrentAnnotation: is used to annotate a Perdurant in > the > >>> >> >> context of the Setting. Also fise:OccurrentAnnotation can link to > >>> >> >> fise:TextAnnotaion (typically verbs in the text defining the > >>> >> >> perdurant) as well as fise:EntityAnnotation suggesting well known > >>> >> >> Events in a knowledge base (e.g. a Election in a country, or an > >>> >> >> upraising ...). In addition fise:OccurrentAnnotation can define > >>> >> >> dc:has-participant links to fise:ParticipantAnnotation. In this > case > >>> >> >> it is explicitly stated hat an Endurant (the > >>> >> >> fise:ParticipantAnnotation) involved in this Perturant (the > >>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal indexed > this > >>> >> >> annotation should also support properties for defining the > >>> >> >> xsd:dateTime for the start/end. > >>> >> >> > >>> >> >> > >>> >> >> Indeed, an event based data structure makes a lot of sense with > the > >>> >> remark > >>> >> > that you probably won't be able to always extract the date for a > >>> given > >>> >> > setting(situation). > >>> >> > There are 2 thing which are unclear though. > >>> >> > > >>> >> > 1. Perdurant : You could have situations in which the object upon > >>> which > >>> >> the > >>> >> > Subject ( or Endurant ) is acting is not a transitory object ( > such > >>> as an > >>> >> > event, activity ) but rather another Endurant. For example we can > >>> have > >>> >> the > >>> >> > phrase "USA invades Irak" where "USA" is the Endurant ( Subject ) > >>> which > >>> >> > performs the action of "invading" on another Eundurant, namely > >>> "Irak". > >>> >> > > >>> >> > >>> >> By using CAOS, USA would be the Agent and Iraq the Patient. Both are > >>> >> Endurants. The activity "invading" would be the Perdurant. So > ideally > >>> >> you would have a "fise:SettingAnnotation" with: > >>> >> > >>> >> * fise:ParticipantAnnotation for USA with the dc:type caos:Agent, > >>> >> linking to a fise:TextAnnotation for "USA" and a > fise:EntityAnnotation > >>> >> linking to dbpedia:United_States > >>> >> * fise:ParticipantAnnotation for Iraq with the dc:type > caos:Patient, > >>> >> linking to a fise:TextAnnotation for "Irak" and a > >>> >> fise:EntityAnnotation linking to dbpedia:Iraq > >>> >> * fise:OccurrentAnnotation for "invades" with the dc:type > >>> >> caos:Activity, linking to a fise:TextAnnotation for "invades" > >>> >> > >>> >> > 2. Where does the verb, which links the Subject and the Object > come > >>> into > >>> >> > this? I imagined that the Endurant would have a dc:"property" > where > >>> the > >>> >> > property = verb which links to the Object in noun form. For > example > >>> take > >>> >> > again the sentence "USA invades Irak". You would have the "USA" > >>> Entity > >>> >> with > >>> >> > dc:invader which points to the Object "Irak". The Endurant would > >>> have as > >>> >> > many dc:"property" elements as there are verbs which link it to an > >>> >> Object. > >>> >> > >>> >> As explained above you would have a fise:OccurrentAnnotation that > >>> >> represents the Perdurant. The information that the activity mention > in > >>> >> the text is "invades" would be by linking to a fise:TextAnnotation. > If > >>> >> you can also provide an Ontology for Tasks that defines > >>> >> "myTasks:invade" the fise:OccurrentAnnotation could also link to an > >>> >> fise:EntityAnnotation for this concept. > >>> >> > >>> >> best > >>> >> Rupert > >>> >> > >>> >> > > >>> >> > ### Consuming the data: > >>> >> >> > >>> >> >> I think this model should be sufficient for use-cases as > described > >>> by > >>> >> you. > >>> >> >> > >>> >> >> Users would be able to consume data on the setting level. This > can > >>> be > >>> >> >> done my simple retrieving all fise:ParticipantAnnotation as well > as > >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW this was the > >>> >> >> approach used in LIVE [2] for semantic search. It allows queries > for > >>> >> >> Settings that involve specific Entities e.g. you could filter for > >>> >> >> Settings that involve a {Person}, activities:Arrested and a > specific > >>> >> >> {Upraising}. However note that with this approach you will get > >>> results > >>> >> >> for Setting where the {Person} participated and an other person > was > >>> >> >> arrested. > >>> >> >> > >>> >> >> An other possibility would be to process enhancement results on > the > >>> >> >> fise:OccurrentAnnotation. This would allow to a much higher > >>> >> >> granularity level (e.g. it would allow to correctly answer the > query > >>> >> >> used as an example above). But I am wondering if the quality of > the > >>> >> >> Setting extraction will be sufficient for this. I have also > doubts > >>> if > >>> >> >> this can be still realized by using semantic indexing to Apache > Solr > >>> >> >> or if it would be better/necessary to store results in a > TripleStore > >>> >> >> and using SPARQL for retrieval. > >>> >> >> > >>> >> >> The methodology and query language used by YAGO [3] is also very > >>> >> >> relevant for this (especially note chapter 7 SPOTL(X) > >>> Representation). > >>> >> >> > >>> >> >> An other related Topic is the enrichment of Entities (especially > >>> >> >> Events) in knowledge bases based on Settings extracted form > >>> Documents. > >>> >> >> As per definition - in DOLCE - Perdurants are temporal indexed. > That > >>> >> >> means that at the time when added to a knowledge base they might > >>> still > >>> >> >> be in process. So the creation, enriching and refinement of such > >>> >> >> Entities in a the knowledge base seams to be critical for a > System > >>> >> >> like described in your use-case. > >>> >> >> > >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca > >>> >> >> <cristian.petro...@gmail.com> wrote: > >>> >> >> > > >>> >> >> > First of all I have to mention that I am new in the field of > >>> semantic > >>> >> >> > technologies, I've started to read about them in the last 4-5 > >>> >> >> months.Having > >>> >> >> > said that I have a high level overview of what is a good > approach > >>> to > >>> >> >> solve > >>> >> >> > this problem. There are a number of papers on the internet > which > >>> >> describe > >>> >> >> > what steps need to be taken such as : named entity recognition, > >>> >> >> > co-reference resolution, pos tagging and others. > >>> >> >> > >>> >> >> The Stanbol NLP processing module currently only supports > sentence > >>> >> >> detection, tokenization, POS tagging, Chunking, NER and lemma. > >>> support > >>> >> >> for co-reference resolution and dependency trees is currently > >>> missing. > >>> >> >> > >>> >> >> Stanford NLP is already integrated with Stanbol [4]. At the > moment > >>> it > >>> >> >> only supports English, but I do already work to include the other > >>> >> >> supported languages. Other NLP framework that is already > integrated > >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But note that > for > >>> all > >>> >> >> those the integration excludes support for co-reference and > >>> dependency > >>> >> >> trees. > >>> >> >> > >>> >> >> Anyways I am confident that one can implement a first prototype > by > >>> >> >> only using Sentences and POS tags and - if available - Chunks > (e.g. > >>> >> >> Noun phrases). > >>> >> >> > >>> >> >> > >>> >> > I assume that in the Stanbol context, a feature like Relation > >>> extraction > >>> >> > would be implemented as an EnhancementEngine? > >>> >> > What kind of effort would be required for a co-reference > resolution > >>> tool > >>> >> > integration into Stanbol? > >>> >> > > >>> >> > >>> >> Yes in the end it would be an EnhancementEngine. But before we can > >>> >> build such an engine we would need to > >>> >> > >>> >> * extend the Stanbol NLP processing API with Annotations for > >>> co-reference > >>> >> * add support for JSON Serialisation/Parsing for those annotation so > >>> >> that the RESTful NLP Analysis Service can provide co-reference > >>> >> information > >>> >> > >>> >> > At this moment I'll be focusing on 2 aspects: > >>> >> > > >>> >> > 1. Determine the best data structure to encapsulate the extracted > >>> >> > information. I'll take a closer look at Dolce. > >>> >> > >>> >> Don't make to to complex. Defining a proper structure to represent > >>> >> Events will only pay-off if we can also successfully extract such > >>> >> information form processed texts. > >>> >> > >>> >> I would start with > >>> >> > >>> >> * fise:SettingAnnotation > >>> >> * {fise:Enhancement} metadata > >>> >> > >>> >> * fise:ParticipantAnnotation > >>> >> * {fise:Enhancement} metadata > >>> >> * fise:inSetting {settingAnnotation} > >>> >> * fise:hasMention {textAnnotation} > >>> >> * fise:suggestion {entityAnnotation} (multiple if there are more > >>> >> suggestions) > >>> >> * dc:type one of fise:Agent, fise:Patient, fise:Instrument, > >>> fise:Cause > >>> >> > >>> >> * fise:OccurrentAnnotation > >>> >> * {fise:Enhancement} metadata > >>> >> * fise:inSetting {settingAnnotation} > >>> >> * fise:hasMention {textAnnotation} > >>> >> * dc:type set to fise:Activity > >>> >> > >>> >> If it turns out that we can extract more, we can add more structure > to > >>> >> those annotations. We might also think about using an own namespace > >>> >> for those extensions to the annotation structure. > >>> >> > >>> >> > 2. Determine how should all of this be integrated into Stanbol. > >>> >> > >>> >> Just create an EventExtractionEngine and configure a enhancement > chain > >>> >> that does NLP processing and EntityLinking. > >>> >> > >>> >> You should have a look at > >>> >> > >>> >> * SentimentSummarizationEngine [1] as it does a lot of things with > NLP > >>> >> processing results (e.g. connecting adjectives (via verbs) to > >>> >> nouns/pronouns. So as long we can not use explicit dependency trees > >>> >> you code will need to do similar things with Nouns, Pronouns and > >>> >> Verbs. > >>> >> > >>> >> * Disambigutation-MLT engine, as it creates a Java representation of > >>> >> present fise:TextAnnotation and fise:EntityAnnotation [2]. Something > >>> >> similar will also be required by the EventExtractionEngine for fast > >>> >> access to such annotations while iterating over the Sentences of the > >>> >> text. > >>> >> > >>> >> > >>> >> best > >>> >> Rupert > >>> >> > >>> >> [1] > >>> >> > >>> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java > >>> >> [2] > >>> >> > >>> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java > >>> >> > >>> >> > > >>> >> > Thanks > >>> >> > > >>> >> > Hope this helps to bootstrap this discussion > >>> >> >> best > >>> >> >> Rupert > >>> >> >> > >>> >> >> -- > >>> >> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> >> >> | Bodenlehenstraße 11 > ++43-699-11108907 > >>> >> >> | A-5500 Bischofshofen > >>> >> >> > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> >> | Bodenlehenstraße 11 ++43-699-11108907 > >>> >> | A-5500 Bischofshofen > >>> >> > >>> > >>> > >>> > >>> -- > >>> | Rupert Westenthaler rupert.westentha...@gmail.com > >>> | Bodenlehenstraße 11 ++43-699-11108907 > >>> | A-5500 Bischofshofen > >>> > >> > >> > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >