HI Rupert, First of all thanks for the detailed suggestions.
2013/6/12 Rupert Westenthaler <rupert.westentha...@gmail.com> > Hi Cristian, all > > really interesting use case! > > In this mail I will try to give some suggestions on how this could > work out. This suggestions are mainly based on experiences and lessons > learned in the LIVE [2] project where we built an information system > for the Olympic Games in Peking. While this Project excluded the > extraction of Events from unstructured text (because the Olympic > Information System was already providing event data as XML messages) > the semantic search capabilities of this system where very similar as > the one described by your use case. > > IMHO you are not only trying to extract relations, but a formal > representation of the situation described by the text. So lets assume > that the goal is to Annotate a Setting (or Situation) described in the > text - a fise:SettingAnnotation. > > The DOLCE foundational ontology [1] gives some advices on how to model > those. The important relation for modeling this Participation: > > PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) > > where .. > > * ED are Endurants (continuants): Endurants do have an identity so we > would typically refer to them as Entities referenced by a setting. > Note that this includes physical, non-physical as well as > social-objects. > * PD are Perdurants (occurrents): Perdurants are entities that > happen in time. This refers to Events, Activities ... > * PC are Participation: It is an time indexed relation where > Endurants participate in Perdurants > > Modeling this in RDF requires to define some intermediate resources > because RDF does not allow for n-ary relations. > > * fise:SettingAnnotation: It is really handy to define one resource > being the context for all described data. I would call this > "fise:SettingAnnotation" and define it as a sub-concept to > fise:Enhancement. All further enhancement about the extracted Setting > would define a "fise:in-setting" relation to it. > > * fise:ParticipantAnnotation: Is used to annotate that Endurant is > participating on a setting (fise:in-setting fise:SettingAnnotation). > The Endurant itself is described by existing fise:TextAnnotaion (the > mentions) and fise:EntityAnnotation (suggested Entities). Basically > the fise:ParticipantAnnotation will allow an EnhancementEngine to > state that several mentions (in possible different sentences) do > represent the same Endurant as participating in the Setting. In > addition it would be possible to use the dc:type property (similar as > for fise:TextAnnotation) to refer to the role(s) of an participant > (e.g. the set: Agent (intensionally performs an action) Cause > (unintentionally e.g. a mud slide), Patient (a passive role in an > activity) and Instrument (aids an process)), but I am wondering if one > could extract those information. > > * fise:OccurrentAnnotation: is used to annotate a Perdurant in the > context of the Setting. Also fise:OccurrentAnnotation can link to > fise:TextAnnotaion (typically verbs in the text defining the > perdurant) as well as fise:EntityAnnotation suggesting well known > Events in a knowledge base (e.g. a Election in a country, or an > upraising ...). In addition fise:OccurrentAnnotation can define > dc:has-participant links to fise:ParticipantAnnotation. In this case > it is explicitly stated hat an Endurant (the > fise:ParticipantAnnotation) involved in this Perturant (the > fise:OccurrentAnnotation). As Occurrences are temporal indexed this > annotation should also support properties for defining the > xsd:dateTime for the start/end. > > > Indeed, an event based data structure makes a lot of sense with the remark that you probably won't be able to always extract the date for a given setting(situation). There are 2 thing which are unclear though. 1. Perdurant : You could have situations in which the object upon which the Subject ( or Endurant ) is acting is not a transitory object ( such as an event, activity ) but rather another Endurant. For example we can have the phrase "USA invades Irak" where "USA" is the Endurant ( Subject ) which performs the action of "invading" on another Eundurant, namely "Irak". 2. Where does the verb, which links the Subject and the Object come into this? I imagined that the Endurant would have a dc:"property" where the property = verb which links to the Object in noun form. For example take again the sentence "USA invades Irak". You would have the "USA" Entity with dc:invader which points to the Object "Irak". The Endurant would have as many dc:"property" elements as there are verbs which link it to an Object. ### Consuming the data: > > I think this model should be sufficient for use-cases as described by you. > > Users would be able to consume data on the setting level. This can be > done my simple retrieving all fise:ParticipantAnnotation as well as > fise:OccurrentAnnotation linked with a setting. BTW this was the > approach used in LIVE [2] for semantic search. It allows queries for > Settings that involve specific Entities e.g. you could filter for > Settings that involve a {Person}, activities:Arrested and a specific > {Upraising}. However note that with this approach you will get results > for Setting where the {Person} participated and an other person was > arrested. > > An other possibility would be to process enhancement results on the > fise:OccurrentAnnotation. This would allow to a much higher > granularity level (e.g. it would allow to correctly answer the query > used as an example above). But I am wondering if the quality of the > Setting extraction will be sufficient for this. I have also doubts if > this can be still realized by using semantic indexing to Apache Solr > or if it would be better/necessary to store results in a TripleStore > and using SPARQL for retrieval. > > The methodology and query language used by YAGO [3] is also very > relevant for this (especially note chapter 7 SPOTL(X) Representation). > > An other related Topic is the enrichment of Entities (especially > Events) in knowledge bases based on Settings extracted form Documents. > As per definition - in DOLCE - Perdurants are temporal indexed. That > means that at the time when added to a knowledge base they might still > be in process. So the creation, enriching and refinement of such > Entities in a the knowledge base seams to be critical for a System > like described in your use-case. > > On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca > <cristian.petro...@gmail.com> wrote: > > > > First of all I have to mention that I am new in the field of semantic > > technologies, I've started to read about them in the last 4-5 > months.Having > > said that I have a high level overview of what is a good approach to > solve > > this problem. There are a number of papers on the internet which describe > > what steps need to be taken such as : named entity recognition, > > co-reference resolution, pos tagging and others. > > The Stanbol NLP processing module currently only supports sentence > detection, tokenization, POS tagging, Chunking, NER and lemma. support > for co-reference resolution and dependency trees is currently missing. > > Stanford NLP is already integrated with Stanbol [4]. At the moment it > only supports English, but I do already work to include the other > supported languages. Other NLP framework that is already integrated > with Stanbol are Freeling [5] and Talismane [6]. But note that for all > those the integration excludes support for co-reference and dependency > trees. > > Anyways I am confident that one can implement a first prototype by > only using Sentences and POS tags and - if available - Chunks (e.g. > Noun phrases). > > I assume that in the Stanbol context, a feature like Relation extraction would be implemented as an EnhancementEngine? What kind of effort would be required for a co-reference resolution tool integration into Stanbol? At this moment I'll be focusing on 2 aspects: 1. Determine the best data structure to encapsulate the extracted information. I'll take a closer look at Dolce. 2. Determine how should all of this be integrated into Stanbol. Thanks Hope this helps to bootstrap this discussion > best > Rupert > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >