Re: Event Extraction Engine

Cristian Petroaca Wed, 09 Sep 2015 02:48:07 -0700

Another approach to this would be to use a semantic role labeling tool [1]
to determine the type of relation between the subject and object.


Or we could use Word Sense Disambiguation to determine the wordnet class of
the verb (this way we have a standard relation definition) and based on
what relation type it is we can search for the subject and object using
dependency tree parsing in Stanford NLP.

These 2 options ensure that we can have a much bigger recall but I'm not
sure about the precision...

So I think we'll need to first settle on the method of implementing this
engine before starting anything.

[1] http://cogcomp.cs.illinois.edu/page/demo_view/srl

On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca <
[email protected]> wrote:

> Hi Dileepa,
>
> Unfortunately I did not have the time to work on this at all so there is
> no code base . But I'd be happy to start contributing with something to
> this engine and I think it would also be very helpful if you will be able
> to contribute to this as well.
> I did get a chance to test the Stanford relation extractor which works
> fine but it's quite limited to a handful of relation types (live_in,
> located_in, org_based_in, work_for). So we would need to train other models
> if we want to increase the relation type number.
> I also think that the Event Extraction Engine should work in conjunction
> with any coreference and comention engines we have to increase the relation
> count.
>
> Regards,
> Cristian
>
> On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody <
> [email protected]> wrote:
>
>> Hi Cristian and all,
>>
>> Can I please know the status of this event extraction engine? Event
>> extraction is a really useful feature for semantic enhancements and I am
>> interested in collaborating with this work.
>> Is there any code base you are currently working on for this engine work?
>>
>> Thanks,
>> Dileepa
>>
>> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca <
>> [email protected]> wrote:
>>
>> > Hi Edi,
>> >
>> > Thanks for the info. Stanford Relation Extractor sounds very
>> interesting.
>> > I'll give it a try.
>> >
>> > 2015-02-17 17:00 GMT+02:00 Edi Bice <[email protected]>:
>> >
>> > > Hi Cristian,
>> > > Here are a few more resources on Semantic Role/Relationship Labeling:
>> > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser,
>> SEMAFOR
>> > > and Stanford Relation Extractor on the code side
>> > > The last one links to a great paper which I believe holds great
>> potential
>> > > for Stanbol:
>> > > A Linear Programming Formulation for Global Inference in Natural
>> Language
>> > > Tasks
>> > >
>> > > |   |
>> > > |   |   |   |   |   |
>> > > | A Linear Programming Formulation for Global Inference in Natural
>> > > Language Tasks  Last abstract |Contents |Next abstract A Linear
>> > Programming
>> > > Formulation for Global Inference in Natural Language Tasks  |
>> > > |  |
>> > > | View on www.cnts.ua.ac.be | Preview by Yahoo |
>> > > |  |
>> > > |   |
>> > >
>> > >
>> > >
>> > > Edi
>> > >       From: Cristian Petroaca <[email protected]>
>> > >  To: [email protected]
>> > >  Sent: Sunday, February 15, 2015 6:34 AM
>> > >  Subject: Event Extraction Engine
>> > >
>> > > Hi All,
>> > >
>> > > Quite a while ago I started a discussion on this list about Event
>> > > Extraction from text. See
>> > > https://issues.apache.org/jira/browse/STANBOL-1121
>> > > .
>> > >
>> > > I'd like to get started on the actual work and I have been thinking
>> how
>> > to
>> > > best approach this and there are some things that I would do
>> differently
>> > > than what the JIRA describes.I'd like to get your feedback on it.
>> > >
>> > > Basically the main approach would be:
>> > >
>> > > 1. Detect all NERs and their co-references.
>> > >
>> > > 2. Apply semantic role labeling on the sentences where the above
>> > mentioned
>> > > NERs reside.
>> > > I found some interesting Semantic Role labeling libraries such as
>> > > https://code.google.com/p/mate-tools/ or
>> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL.
>> > > With this I'll be able to detect the Agent, the Verb (action) and the
>> > > Patient and Instruments.
>> > >
>> > > This could be a minimal implementation of the engine. After that I can
>> > > simply create the event data model as described in the JIRA and
>> annotate
>> > > the text.
>> > > But this does not actually detect what kind of event it is or what are
>> > the
>> > > event specific roles that the entities have in the relation.
>> > >
>> > > For example we can have the sentence "Google buys Yahoo for $100
>> > million".
>> > > There are a lot more to be said about this sentence than simply that
>> > > "Google" is the agent and "Yahoo" is the Patient. This is actually an
>> > > acquisition event and "Google" is the buyer and "Yahoo" the bought
>> > entity.
>> > > We also would need to align to a common ontology synonym phrases such
>> as
>> > > "buy" or "acquire" so that we know that both refer to the same
>> > Acquisition
>> > > event.
>> > >
>> > > Having said that, we would add a new step :
>> > > 3. Try to detect event type and event details.
>> > >
>> > > This can be done by either:
>> > >
>> > > 3.1 Rule based : hand written rules which would map a certain sentence
>> > > structure, such as the name of the verb and the type of entities as
>> > agent,
>> > > patient to a certain event type.
>> > > This has the benefit of being easy to build but quite inflexible.
>> > >
>> > > 3.2 Statistical based: train a model which would be able to classify
>> an
>> > > event type based on the features of the sentence such as verb type,
>> > entity
>> > > type, role type, etc.. This is the approach described here :
>> > > http://web.stanford.edu/~jurafsky/mintz.pdf.
>> > > This would be quite hard to build but quite flexible.
>> > >
>> > > This 3rd step of detecting event types & details I think would be most
>> > > efficient for domain specific events. We would have configs with
>> several
>> > > models for several domains available and the user could with use one
>> of
>> > the
>> > > pre-existent models or create a new one.
>> > >
>> > > I don't have any practical experience with training models or text
>> > > classification based on features (but I've been doing a lot of
>> reading on
>> > > it) so I'm not sure exactly how feasible what I said at point no 3
>> > actually
>> > > is.
>> > >
>> > > Regards,
>> > > Cristian
>> > >
>> > >
>> > >
>> > >
>> >
>>
>
>

Re: Event Extraction Engine

Reply via email to