Hi Cristian, all really interesting use case!
In this mail I will try to give some suggestions on how this could work out. This suggestions are mainly based on experiences and lessons learned in the LIVE [2] project where we built an information system for the Olympic Games in Peking. While this Project excluded the extraction of Events from unstructured text (because the Olympic Information System was already providing event data as XML messages) the semantic search capabilities of this system where very similar as the one described by your use case. IMHO you are not only trying to extract relations, but a formal representation of the situation described by the text. So lets assume that the goal is to Annotate a Setting (or Situation) described in the text - a fise:SettingAnnotation. The DOLCE foundational ontology [1] gives some advices on how to model those. The important relation for modeling this Participation: PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t)) where .. * ED are Endurants (continuants): Endurants do have an identity so we would typically refer to them as Entities referenced by a setting. Note that this includes physical, non-physical as well as social-objects. * PD are Perdurants (occurrents): Perdurants are entities that happen in time. This refers to Events, Activities ... * PC are Participation: It is an time indexed relation where Endurants participate in Perdurants Modeling this in RDF requires to define some intermediate resources because RDF does not allow for n-ary relations. * fise:SettingAnnotation: It is really handy to define one resource being the context for all described data. I would call this "fise:SettingAnnotation" and define it as a sub-concept to fise:Enhancement. All further enhancement about the extracted Setting would define a "fise:in-setting" relation to it. * fise:ParticipantAnnotation: Is used to annotate that Endurant is participating on a setting (fise:in-setting fise:SettingAnnotation). The Endurant itself is described by existing fise:TextAnnotaion (the mentions) and fise:EntityAnnotation (suggested Entities). Basically the fise:ParticipantAnnotation will allow an EnhancementEngine to state that several mentions (in possible different sentences) do represent the same Endurant as participating in the Setting. In addition it would be possible to use the dc:type property (similar as for fise:TextAnnotation) to refer to the role(s) of an participant (e.g. the set: Agent (intensionally performs an action) Cause (unintentionally e.g. a mud slide), Patient (a passive role in an activity) and Instrument (aids an process)), but I am wondering if one could extract those information. * fise:OccurrentAnnotation: is used to annotate a Perdurant in the context of the Setting. Also fise:OccurrentAnnotation can link to fise:TextAnnotaion (typically verbs in the text defining the perdurant) as well as fise:EntityAnnotation suggesting well known Events in a knowledge base (e.g. a Election in a country, or an upraising ...). In addition fise:OccurrentAnnotation can define dc:has-participant links to fise:ParticipantAnnotation. In this case it is explicitly stated hat an Endurant (the fise:ParticipantAnnotation) involved in this Perturant (the fise:OccurrentAnnotation). As Occurrences are temporal indexed this annotation should also support properties for defining the xsd:dateTime for the start/end. ### Consuming the data: I think this model should be sufficient for use-cases as described by you. Users would be able to consume data on the setting level. This can be done my simple retrieving all fise:ParticipantAnnotation as well as fise:OccurrentAnnotation linked with a setting. BTW this was the approach used in LIVE [2] for semantic search. It allows queries for Settings that involve specific Entities e.g. you could filter for Settings that involve a {Person}, activities:Arrested and a specific {Upraising}. However note that with this approach you will get results for Setting where the {Person} participated and an other person was arrested. An other possibility would be to process enhancement results on the fise:OccurrentAnnotation. This would allow to a much higher granularity level (e.g. it would allow to correctly answer the query used as an example above). But I am wondering if the quality of the Setting extraction will be sufficient for this. I have also doubts if this can be still realized by using semantic indexing to Apache Solr or if it would be better/necessary to store results in a TripleStore and using SPARQL for retrieval. The methodology and query language used by YAGO [3] is also very relevant for this (especially note chapter 7 SPOTL(X) Representation). An other related Topic is the enrichment of Entities (especially Events) in knowledge bases based on Settings extracted form Documents. As per definition - in DOLCE - Perdurants are temporal indexed. That means that at the time when added to a knowledge base they might still be in process. So the creation, enriching and refinement of such Entities in a the knowledge base seams to be critical for a System like described in your use-case. On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca <cristian.petro...@gmail.com> wrote: > > First of all I have to mention that I am new in the field of semantic > technologies, I've started to read about them in the last 4-5 months.Having > said that I have a high level overview of what is a good approach to solve > this problem. There are a number of papers on the internet which describe > what steps need to be taken such as : named entity recognition, > co-reference resolution, pos tagging and others. The Stanbol NLP processing module currently only supports sentence detection, tokenization, POS tagging, Chunking, NER and lemma. support for co-reference resolution and dependency trees is currently missing. Stanford NLP is already integrated with Stanbol [4]. At the moment it only supports English, but I do already work to include the other supported languages. Other NLP framework that is already integrated with Stanbol are Freeling [5] and Talismane [6]. But note that for all those the integration excludes support for co-reference and dependency trees. Anyways I am confident that one can implement a first prototype by only using Sentences and POS tags and - if available - Chunks (e.g. Noun phrases). Hope this helps to bootstrap this discussion best Rupert -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen