Re: Event Extraction Engine

Cristian Petroaca Wed, 18 Nov 2015 12:37:08 -0800

I created a git repository which contains the event extraction engine here
https://github.com/cpetroaca/stanbol-event-extraction-engine. I've started
working on an event rule schema that will also incorporate a generic
ontology definition schema so that one can say that #Person=
http://dbpedia.org/Person and then use #Person in the rules. I think that
because Stanbol has access to a dbpedia or yago index will be of great
value when we want to define events with specific object classes.


Dileepa, if you still want to get involved, you can take a look at the
Stanbol Stanford NLP project here
https://github.com/westei/stanbol-stanfordnlp and figure out how to add
Collapsed Dependencies(
http://nlp.stanford.edu/software/dependencies_manual.pdf)  to it. We'll
need them to sort out the subject, verb and objects.

Thanks,
Cristian

On Mon, Oct 12, 2015 at 3:31 PM, Cristian Petroaca <
[email protected]> wrote:

> Can we get a separate branch where we can start developing the Event
> Extraction engine?
>
> Thanks
>
> On Sun, Sep 20, 2015 at 4:26 PM, Cristian Petroaca <
> [email protected]> wrote:
>
>> Sorry, hit sent before finishing the mail :).
>>
>> So, you will disambiguate it using wordnet like this :
>>
>> http://wordnetweb.princeton.edu/perl/webwn?s=attack&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=000000
>>
>> And then you would have a rule file which would contain something like :
>> event name= "attack"
>> event trigger= wordnet class of type = wordnet id && pos=verb
>> agent=dependency_type:nsubj&&entity_type=Person||Location
>> patient=dependency_type:dobj&&entity_type=Person||Location
>>
>> The dependecy type points to the Stanford NLP dependency tree relation
>> types described here:
>> http://nlp.stanford.edu/software/stanford-dependencies.shtml
>> The entity_type points to either the NER class or the wordnet class for
>> the noun in the noun phrase.
>>
>> This approach was inspired by this paper :
>> http://www.surdeanu.info/mihai/papers/acl2015.pdf with the difference
>> that I'm using WSD to disambiguate the event trigger.
>>
>> I'll start doing some experiments with this approach.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sun, Sep 20, 2015 at 4:14 PM, Cristian Petroaca <
>> [email protected]> wrote:
>>
>>> Hi Dileepa,
>>>
>>> I've been thinking more about the approach using a Word Sense
>>> Disambiguation tool to classify the verb in the sentence and I think it may
>>> be a good approach. The verb seems to be the event trigger and once you
>>> know its actual meaning (by applying a Wordnet class or some other DB used
>>> for WSD) then I think it's quite straightforward to identify the actors in
>>> the event (agent, patient, instrument, etc) by applying some user defined
>>> rules for that verb class.
>>>
>>> For example if you have the verb "attack" which can have multiple
>>> meanings depending on the context you will disambiguate it using wordnet
>>> like this:
>>>
>>> On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody <
>>> [email protected]> wrote:
>>>
>>>> Hi Cristian,
>>>>
>>>> Interesting ideas. Let me do some background reading on this, so I can
>>>> also
>>>> participate in the discussion better.
>>>>
>>>> Thanks,
>>>> Dileepa
>>>>
>>>> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca <
>>>> [email protected]> wrote:
>>>>
>>>> > Another approach to this would be to use a semantic role labeling
>>>> tool [1]
>>>> > to determine the type of relation between the subject and object.
>>>> >
>>>> > Or we could use Word Sense Disambiguation to determine the wordnet
>>>> class of
>>>> > the verb (this way we have a standard relation definition) and based
>>>> on
>>>> > what relation type it is we can search for the subject and object
>>>> using
>>>> > dependency tree parsing in Stanford NLP.
>>>> >
>>>> > These 2 options ensure that we can have a much bigger recall but I'm
>>>> not
>>>> > sure about the precision...
>>>> >
>>>> > So I think we'll need to first settle on the method of implementing
>>>> this
>>>> > engine before starting anything.
>>>> >
>>>> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl
>>>> >
>>>> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca <
>>>> > [email protected]> wrote:
>>>> >
>>>> > > Hi Dileepa,
>>>> > >
>>>> > > Unfortunately I did not have the time to work on this at all so
>>>> there is
>>>> > > no code base . But I'd be happy to start contributing with
>>>> something to
>>>> > > this engine and I think it would also be very helpful if you will
>>>> be able
>>>> > > to contribute to this as well.
>>>> > > I did get a chance to test the Stanford relation extractor which
>>>> works
>>>> > > fine but it's quite limited to a handful of relation types (live_in,
>>>> > > located_in, org_based_in, work_for). So we would need to train other
>>>> > models
>>>> > > if we want to increase the relation type number.
>>>> > > I also think that the Event Extraction Engine should work in
>>>> conjunction
>>>> > > with any coreference and comention engines we have to increase the
>>>> > relation
>>>> > > count.
>>>> > >
>>>> > > Regards,
>>>> > > Cristian
>>>> > >
>>>> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody <
>>>> > > [email protected]> wrote:
>>>> > >
>>>> > >> Hi Cristian and all,
>>>> > >>
>>>> > >> Can I please know the status of this event extraction engine? Event
>>>> > >> extraction is a really useful feature for semantic enhancements
>>>> and I am
>>>> > >> interested in collaborating with this work.
>>>> > >> Is there any code base you are currently working on for this engine
>>>> > work?
>>>> > >>
>>>> > >> Thanks,
>>>> > >> Dileepa
>>>> > >>
>>>> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
>>>> > >> [email protected]> wrote:
>>>> > >>
>>>> > >> > Hi Edi,
>>>> > >> >
>>>> > >> > Thanks for the info. Stanford Relation Extractor sounds very
>>>> > >> interesting.
>>>> > >> > I'll give it a try.
>>>> > >> >
>>>> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <[email protected]
>>>> >:
>>>> > >> >
>>>> > >> > > Hi Cristian,
>>>> > >> > > Here are a few more resources on Semantic Role/Relationship
>>>> > Labeling:
>>>> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2.
>>>> Shalmaneser,
>>>> > >> SEMAFOR
>>>> > >> > > and Stanford Relation Extractor on the code side
>>>> > >> > > The last one links to a great paper which I believe holds great
>>>> > >> potential
>>>> > >> > > for Stanbol:
>>>> > >> > > A Linear Programming Formulation for Global Inference in
>>>> Natural
>>>> > >> Language
>>>> > >> > > Tasks
>>>> > >> > >
>>>> > >> > > |   |
>>>> > >> > > |   |   |   |   |   |
>>>> > >> > > | A Linear Programming Formulation for Global Inference in
>>>> Natural
>>>> > >> > > Language Tasks  Last abstract |Contents |Next abstract A Linear
>>>> > >> > Programming
>>>> > >> > > Formulation for Global Inference in Natural Language Tasks  |
>>>> > >> > > |  |
>>>> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo |
>>>> > >> > > |  |
>>>> > >> > > |   |
>>>> > >> > >
>>>> > >> > >
>>>> > >> > >
>>>> > >> > > Edi
>>>> > >> > >       From: Cristian Petroaca <[email protected]>
>>>> > >> > >  To: [email protected]
>>>> > >> > >  Sent: Sunday, February 15, 2015 6:34 AM
>>>> > >> > >  Subject: Event Extraction Engine
>>>> > >> > >
>>>> > >> > > Hi All,
>>>> > >> > >
>>>> > >> > > Quite a while ago I started a discussion on this list about
>>>> Event
>>>> > >> > > Extraction from text. See
>>>> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121
>>>> > >> > > .
>>>> > >> > >
>>>> > >> > > I'd like to get started on the actual work and I have been
>>>> thinking
>>>> > >> how
>>>> > >> > to
>>>> > >> > > best approach this and there are some things that I would do
>>>> > >> differently
>>>> > >> > > than what the JIRA describes.I'd like to get your feedback on
>>>> it.
>>>> > >> > >
>>>> > >> > > Basically the main approach would be:
>>>> > >> > >
>>>> > >> > > 1. Detect all NERs and their co-references.
>>>> > >> > >
>>>> > >> > > 2. Apply semantic role labeling on the sentences where the
>>>> above
>>>> > >> > mentioned
>>>> > >> > > NERs reside.
>>>> > >> > > I found some interesting Semantic Role labeling libraries such
>>>> as
>>>> > >> > > https://code.google.com/p/mate-tools/ or
>>>> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
>>>> > >> > > With this I'll be able to detect the Agent, the Verb (action)
>>>> and
>>>> > the
>>>> > >> > > Patient and Instruments.
>>>> > >> > >
>>>> > >> > > This could be a minimal implementation of the engine. After
>>>> that I
>>>> > can
>>>> > >> > > simply create the event data model as described in the JIRA and
>>>> > >> annotate
>>>> > >> > > the text.
>>>> > >> > > But this does not actually detect what kind of event it is or
>>>> what
>>>> > are
>>>> > >> > the
>>>> > >> > > event specific roles that the entities have in the relation.
>>>> > >> > >
>>>> > >> > > For example we can have the sentence "Google buys Yahoo for
>>>> $100
>>>> > >> > million".
>>>> > >> > > There are a lot more to be said about this sentence than
>>>> simply that
>>>> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is
>>>> actually
>>>> > an
>>>> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the
>>>> bought
>>>> > >> > entity.
>>>> > >> > > We also would need to align to a common ontology synonym
>>>> phrases
>>>> > such
>>>> > >> as
>>>> > >> > > "buy" or "acquire" so that we know that both refer to the same
>>>> > >> > Acquisition
>>>> > >> > > event.
>>>> > >> > >
>>>> > >> > > Having said that, we would add a new step :
>>>> > >> > > 3. Try to detect event type and event details.
>>>> > >> > >
>>>> > >> > > This can be done by either:
>>>> > >> > >
>>>> > >> > > 3.1 Rule based : hand written rules which would map a certain
>>>> > sentence
>>>> > >> > > structure, such as the name of the verb and the type of
>>>> entities as
>>>> > >> > agent,
>>>> > >> > > patient to a certain event type.
>>>> > >> > > This has the benefit of being easy to build but quite
>>>> inflexible.
>>>> > >> > >
>>>> > >> > > 3.2 Statistical based: train a model which would be able to
>>>> classify
>>>> > >> an
>>>> > >> > > event type based on the features of the sentence such as verb
>>>> type,
>>>> > >> > entity
>>>> > >> > > type, role type, etc.. This is the approach described here :
>>>> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf.
>>>> > >> > > This would be quite hard to build but quite flexible.
>>>> > >> > >
>>>> > >> > > This 3rd step of detecting event types & details I think would
>>>> be
>>>> > most
>>>> > >> > > efficient for domain specific events. We would have configs
>>>> with
>>>> > >> several
>>>> > >> > > models for several domains available and the user could with
>>>> use one
>>>> > >> of
>>>> > >> > the
>>>> > >> > > pre-existent models or create a new one.
>>>> > >> > >
>>>> > >> > > I don't have any practical experience with training models or
>>>> text
>>>> > >> > > classification based on features (but I've been doing a lot of
>>>> > >> reading on
>>>> > >> > > it) so I'm not sure exactly how feasible what I said at point
>>>> no 3
>>>> > >> > actually
>>>> > >> > > is.
>>>> > >> > >
>>>> > >> > > Regards,
>>>> > >> > > Cristian
>>>> > >> > >
>>>> > >> > >
>>>> > >> > >
>>>> > >> > >
>>>> > >> >
>>>> > >>
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Event Extraction Engine

Reply via email to