Hi Cristian, all

really interesting use case!

In this mail I will try to give some suggestions on how this could
work out. This suggestions are mainly based on experiences and lessons
learned in the LIVE [2] project where we built an information system
for the Olympic Games in Peking. While this Project excluded the
extraction of Events from unstructured text (because the Olympic
Information System was already providing event data as XML messages)
the semantic search capabilities of this system where very similar as
the one described by your use case.

IMHO you are not only trying to extract relations, but a formal
representation of the situation described by the text. So lets assume
that the goal is to Annotate a Setting (or Situation) described in the
text - a fise:SettingAnnotation.

The DOLCE foundational ontology [1] gives some advices on how to model
those. The important relation for modeling this Participation:

    PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))

where ..

 * ED are Endurants (continuants): Endurants do have an identity so we
would typically refer to them as Entities referenced by a setting.
Note that this includes physical, non-physical as well as
social-objects.
 * PD are Perdurants (occurrents):  Perdurants are entities that
happen in time. This refers to Events, Activities ...
 * PC are Participation: It is an time indexed relation where
Endurants participate in Perdurants

Modeling this in RDF requires to define some intermediate resources
because RDF does not allow for n-ary relations.

 * fise:SettingAnnotation: It is really handy to define one resource
being the context for all described data. I would call this
"fise:SettingAnnotation" and define it as a sub-concept to
fise:Enhancement. All further enhancement about the extracted Setting
would define a "fise:in-setting" relation to it.

 * fise:ParticipantAnnotation: Is used to annotate that Endurant is
participating on a setting (fise:in-setting fise:SettingAnnotation).
The Endurant itself is described by existing fise:TextAnnotaion (the
mentions) and fise:EntityAnnotation (suggested Entities). Basically
the fise:ParticipantAnnotation will allow an EnhancementEngine to
state that several mentions (in possible different sentences) do
represent the same Endurant as participating in the Setting. In
addition it would be possible to use the dc:type property (similar as
for fise:TextAnnotation) to refer to the role(s) of an participant
(e.g. the set: Agent (intensionally performs an action) Cause
(unintentionally e.g. a mud slide), Patient (a passive role in an
activity) and Instrument (aids an process)), but I am wondering if one
could extract those information.

* fise:OccurrentAnnotation: is used to annotate a Perdurant in the
context of the Setting. Also fise:OccurrentAnnotation can link to
fise:TextAnnotaion (typically verbs in the text defining the
perdurant) as well as fise:EntityAnnotation suggesting well known
Events in a knowledge base (e.g. a Election in a country, or an
upraising ...). In addition fise:OccurrentAnnotation can define
dc:has-participant links to fise:ParticipantAnnotation. In this case
it is explicitly stated hat an Endurant (the
fise:ParticipantAnnotation) involved in this Perturant (the
fise:OccurrentAnnotation). As Occurrences are temporal indexed this
annotation should also support properties for defining the
xsd:dateTime for the start/end.


### Consuming the data:

I think this model should be sufficient for use-cases as described by you.

Users would be able to consume data on the setting level. This can be
done my simple retrieving all fise:ParticipantAnnotation as well as
fise:OccurrentAnnotation linked with a setting. BTW this was the
approach used in LIVE [2] for semantic search. It allows queries for
Settings that involve specific Entities e.g. you could filter for
Settings that involve a {Person}, activities:Arrested and a specific
{Upraising}. However note that with this approach you will get results
for Setting where the {Person} participated and an other person was
arrested.

An other possibility would be to process enhancement results on the
fise:OccurrentAnnotation. This would allow to a much higher
granularity level (e.g. it would allow to correctly answer the query
used as an example above). But I am wondering if the quality of the
Setting extraction will be sufficient for this. I have also doubts if
this can be still realized by using semantic indexing to Apache Solr
or if it would be better/necessary to store results in a TripleStore
and using SPARQL for retrieval.

The methodology and query language used by YAGO [3] is also very
relevant for this (especially note chapter 7 SPOTL(X) Representation).

An other related Topic is the enrichment of Entities (especially
Events) in knowledge bases based on Settings extracted form Documents.
As per definition - in DOLCE - Perdurants are temporal indexed. That
means that at the time when added to a knowledge base they might still
be in process. So the creation, enriching and refinement of such
Entities in a the knowledge base seams to be critical for a System
like described in your use-case.

On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
<cristian.petro...@gmail.com> wrote:
>
> First of all I have to mention that I am new in the field of semantic
> technologies, I've started to read about them in the last 4-5 months.Having
> said that I have a high level overview of what is a good approach to solve
> this problem. There are a number of papers on the internet which describe
> what steps need to be taken such as : named entity recognition,
> co-reference resolution, pos tagging and others.

The Stanbol NLP processing module currently only supports sentence
detection, tokenization, POS tagging, Chunking, NER and lemma. support
for co-reference resolution and dependency trees is currently missing.

Stanford NLP is already integrated with Stanbol [4]. At the moment it
only supports English, but I do already work to include the other
supported languages. Other NLP framework that is already integrated
with Stanbol are Freeling [5] and Talismane [6]. But note that for all
those the integration excludes support for co-reference and dependency
trees.

Anyways I am confident that one can implement a first prototype by
only using Sentences and POS tags and - if available - Chunks (e.g.
Noun phrases).

Hope this helps to bootstrap this discussion
best
Rupert

--
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to