Re: Event Extraction Engine

Cristian Petroaca Mon, 12 Oct 2015 05:31:58 -0700

Can we get a separate branch where we can start developing the Event
Extraction engine?


Thanks

On Sun, Sep 20, 2015 at 4:26 PM, Cristian Petroaca <
[email protected]> wrote:

> Sorry, hit sent before finishing the mail :).
>
> So, you will disambiguate it using wordnet like this :
>
> http://wordnetweb.princeton.edu/perl/webwn?s=attack&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=000000
>
> And then you would have a rule file which would contain something like :
> event name= "attack"
> event trigger= wordnet class of type = wordnet id && pos=verb
> agent=dependency_type:nsubj&&entity_type=Person||Location
> patient=dependency_type:dobj&&entity_type=Person||Location
>
> The dependecy type points to the Stanford NLP dependency tree relation
> types described here:
> http://nlp.stanford.edu/software/stanford-dependencies.shtml
> The entity_type points to either the NER class or the wordnet class for
> the noun in the noun phrase.
>
> This approach was inspired by this paper :
> http://www.surdeanu.info/mihai/papers/acl2015.pdf with the difference
> that I'm using WSD to disambiguate the event trigger.
>
> I'll start doing some experiments with this approach.
>
>
>
>
>
>
>
>
> On Sun, Sep 20, 2015 at 4:14 PM, Cristian Petroaca <
> [email protected]> wrote:
>
>> Hi Dileepa,
>>
>> I've been thinking more about the approach using a Word Sense
>> Disambiguation tool to classify the verb in the sentence and I think it may
>> be a good approach. The verb seems to be the event trigger and once you
>> know its actual meaning (by applying a Wordnet class or some other DB used
>> for WSD) then I think it's quite straightforward to identify the actors in
>> the event (agent, patient, instrument, etc) by applying some user defined
>> rules for that verb class.
>>
>> For example if you have the verb "attack" which can have multiple
>> meanings depending on the context you will disambiguate it using wordnet
>> like this:
>>
>> On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody <
>> [email protected]> wrote:
>>
>>> Hi Cristian,
>>>
>>> Interesting ideas. Let me do some background reading on this, so I can
>>> also
>>> participate in the discussion better.
>>>
>>> Thanks,
>>> Dileepa
>>>
>>> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca <
>>> [email protected]> wrote:
>>>
>>> > Another approach to this would be to use a semantic role labeling tool
>>> [1]
>>> > to determine the type of relation between the subject and object.
>>> >
>>> > Or we could use Word Sense Disambiguation to determine the wordnet
>>> class of
>>> > the verb (this way we have a standard relation definition) and based on
>>> > what relation type it is we can search for the subject and object using
>>> > dependency tree parsing in Stanford NLP.
>>> >
>>> > These 2 options ensure that we can have a much bigger recall but I'm
>>> not
>>> > sure about the precision...
>>> >
>>> > So I think we'll need to first settle on the method of implementing
>>> this
>>> > engine before starting anything.
>>> >
>>> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl
>>> >
>>> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca <
>>> > [email protected]> wrote:
>>> >
>>> > > Hi Dileepa,
>>> > >
>>> > > Unfortunately I did not have the time to work on this at all so
>>> there is
>>> > > no code base . But I'd be happy to start contributing with something
>>> to
>>> > > this engine and I think it would also be very helpful if you will be
>>> able
>>> > > to contribute to this as well.
>>> > > I did get a chance to test the Stanford relation extractor which
>>> works
>>> > > fine but it's quite limited to a handful of relation types (live_in,
>>> > > located_in, org_based_in, work_for). So we would need to train other
>>> > models
>>> > > if we want to increase the relation type number.
>>> > > I also think that the Event Extraction Engine should work in
>>> conjunction
>>> > > with any coreference and comention engines we have to increase the
>>> > relation
>>> > > count.
>>> > >
>>> > > Regards,
>>> > > Cristian
>>> > >
>>> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody <
>>> > > [email protected]> wrote:
>>> > >
>>> > >> Hi Cristian and all,
>>> > >>
>>> > >> Can I please know the status of this event extraction engine? Event
>>> > >> extraction is a really useful feature for semantic enhancements and
>>> I am
>>> > >> interested in collaborating with this work.
>>> > >> Is there any code base you are currently working on for this engine
>>> > work?
>>> > >>
>>> > >> Thanks,
>>> > >> Dileepa
>>> > >>
>>> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
>>> > >> [email protected]> wrote:
>>> > >>
>>> > >> > Hi Edi,
>>> > >> >
>>> > >> > Thanks for the info. Stanford Relation Extractor sounds very
>>> > >> interesting.
>>> > >> > I'll give it a try.
>>> > >> >
>>> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <[email protected]>:
>>> > >> >
>>> > >> > > Hi Cristian,
>>> > >> > > Here are a few more resources on Semantic Role/Relationship
>>> > Labeling:
>>> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser,
>>> > >> SEMAFOR
>>> > >> > > and Stanford Relation Extractor on the code side
>>> > >> > > The last one links to a great paper which I believe holds great
>>> > >> potential
>>> > >> > > for Stanbol:
>>> > >> > > A Linear Programming Formulation for Global Inference in Natural
>>> > >> Language
>>> > >> > > Tasks
>>> > >> > >
>>> > >> > > |   |
>>> > >> > > |   |   |   |   |   |
>>> > >> > > | A Linear Programming Formulation for Global Inference in
>>> Natural
>>> > >> > > Language Tasks  Last abstract |Contents |Next abstract A Linear
>>> > >> > Programming
>>> > >> > > Formulation for Global Inference in Natural Language Tasks  |
>>> > >> > > |  |
>>> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo |
>>> > >> > > |  |
>>> > >> > > |   |
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> > > Edi
>>> > >> > >       From: Cristian Petroaca <[email protected]>
>>> > >> > >  To: [email protected]
>>> > >> > >  Sent: Sunday, February 15, 2015 6:34 AM
>>> > >> > >  Subject: Event Extraction Engine
>>> > >> > >
>>> > >> > > Hi All,
>>> > >> > >
>>> > >> > > Quite a while ago I started a discussion on this list about
>>> Event
>>> > >> > > Extraction from text. See
>>> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121
>>> > >> > > .
>>> > >> > >
>>> > >> > > I'd like to get started on the actual work and I have been
>>> thinking
>>> > >> how
>>> > >> > to
>>> > >> > > best approach this and there are some things that I would do
>>> > >> differently
>>> > >> > > than what the JIRA describes.I'd like to get your feedback on
>>> it.
>>> > >> > >
>>> > >> > > Basically the main approach would be:
>>> > >> > >
>>> > >> > > 1. Detect all NERs and their co-references.
>>> > >> > >
>>> > >> > > 2. Apply semantic role labeling on the sentences where the above
>>> > >> > mentioned
>>> > >> > > NERs reside.
>>> > >> > > I found some interesting Semantic Role labeling libraries such
>>> as
>>> > >> > > https://code.google.com/p/mate-tools/ or
>>> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
>>> > >> > > With this I'll be able to detect the Agent, the Verb (action)
>>> and
>>> > the
>>> > >> > > Patient and Instruments.
>>> > >> > >
>>> > >> > > This could be a minimal implementation of the engine. After
>>> that I
>>> > can
>>> > >> > > simply create the event data model as described in the JIRA and
>>> > >> annotate
>>> > >> > > the text.
>>> > >> > > But this does not actually detect what kind of event it is or
>>> what
>>> > are
>>> > >> > the
>>> > >> > > event specific roles that the entities have in the relation.
>>> > >> > >
>>> > >> > > For example we can have the sentence "Google buys Yahoo for $100
>>> > >> > million".
>>> > >> > > There are a lot more to be said about this sentence than simply
>>> that
>>> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is
>>> actually
>>> > an
>>> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the
>>> bought
>>> > >> > entity.
>>> > >> > > We also would need to align to a common ontology synonym phrases
>>> > such
>>> > >> as
>>> > >> > > "buy" or "acquire" so that we know that both refer to the same
>>> > >> > Acquisition
>>> > >> > > event.
>>> > >> > >
>>> > >> > > Having said that, we would add a new step :
>>> > >> > > 3. Try to detect event type and event details.
>>> > >> > >
>>> > >> > > This can be done by either:
>>> > >> > >
>>> > >> > > 3.1 Rule based : hand written rules which would map a certain
>>> > sentence
>>> > >> > > structure, such as the name of the verb and the type of
>>> entities as
>>> > >> > agent,
>>> > >> > > patient to a certain event type.
>>> > >> > > This has the benefit of being easy to build but quite
>>> inflexible.
>>> > >> > >
>>> > >> > > 3.2 Statistical based: train a model which would be able to
>>> classify
>>> > >> an
>>> > >> > > event type based on the features of the sentence such as verb
>>> type,
>>> > >> > entity
>>> > >> > > type, role type, etc.. This is the approach described here :
>>> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf.
>>> > >> > > This would be quite hard to build but quite flexible.
>>> > >> > >
>>> > >> > > This 3rd step of detecting event types & details I think would
>>> be
>>> > most
>>> > >> > > efficient for domain specific events. We would have configs with
>>> > >> several
>>> > >> > > models for several domains available and the user could with
>>> use one
>>> > >> of
>>> > >> > the
>>> > >> > > pre-existent models or create a new one.
>>> > >> > >
>>> > >> > > I don't have any practical experience with training models or
>>> text
>>> > >> > > classification based on features (but I've been doing a lot of
>>> > >> reading on
>>> > >> > > it) so I'm not sure exactly how feasible what I said at point
>>> no 3
>>> > >> > actually
>>> > >> > > is.
>>> > >> > >
>>> > >> > > Regards,
>>> > >> > > Cristian
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Event Extraction Engine

Reply via email to