Can we get a separate branch where we can start developing the Event Extraction engine?
Thanks On Sun, Sep 20, 2015 at 4:26 PM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Sorry, hit sent before finishing the mail :). > > So, you will disambiguate it using wordnet like this : > > http://wordnetweb.princeton.edu/perl/webwn?s=attack&sub=Search+WordNet&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&h=000000 > > And then you would have a rule file which would contain something like : > event name= "attack" > event trigger= wordnet class of type = wordnet id && pos=verb > agent=dependency_type:nsubj&&entity_type=Person||Location > patient=dependency_type:dobj&&entity_type=Person||Location > > The dependecy type points to the Stanford NLP dependency tree relation > types described here: > http://nlp.stanford.edu/software/stanford-dependencies.shtml > The entity_type points to either the NER class or the wordnet class for > the noun in the noun phrase. > > This approach was inspired by this paper : > http://www.surdeanu.info/mihai/papers/acl2015.pdf with the difference > that I'm using WSD to disambiguate the event trigger. > > I'll start doing some experiments with this approach. > > > > > > > > > On Sun, Sep 20, 2015 at 4:14 PM, Cristian Petroaca < > cristian.petro...@gmail.com> wrote: > >> Hi Dileepa, >> >> I've been thinking more about the approach using a Word Sense >> Disambiguation tool to classify the verb in the sentence and I think it may >> be a good approach. The verb seems to be the event trigger and once you >> know its actual meaning (by applying a Wordnet class or some other DB used >> for WSD) then I think it's quite straightforward to identify the actors in >> the event (agent, patient, instrument, etc) by applying some user defined >> rules for that verb class. >> >> For example if you have the verb "attack" which can have multiple >> meanings depending on the context you will disambiguate it using wordnet >> like this: >> >> On Wed, Sep 9, 2015 at 8:33 PM, Dileepa Jayakody < >> dileepajayak...@gmail.com> wrote: >> >>> Hi Cristian, >>> >>> Interesting ideas. Let me do some background reading on this, so I can >>> also >>> participate in the discussion better. >>> >>> Thanks, >>> Dileepa >>> >>> On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca < >>> cristian.petro...@gmail.com> wrote: >>> >>> > Another approach to this would be to use a semantic role labeling tool >>> [1] >>> > to determine the type of relation between the subject and object. >>> > >>> > Or we could use Word Sense Disambiguation to determine the wordnet >>> class of >>> > the verb (this way we have a standard relation definition) and based on >>> > what relation type it is we can search for the subject and object using >>> > dependency tree parsing in Stanford NLP. >>> > >>> > These 2 options ensure that we can have a much bigger recall but I'm >>> not >>> > sure about the precision... >>> > >>> > So I think we'll need to first settle on the method of implementing >>> this >>> > engine before starting anything. >>> > >>> > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl >>> > >>> > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca < >>> > cristian.petro...@gmail.com> wrote: >>> > >>> > > Hi Dileepa, >>> > > >>> > > Unfortunately I did not have the time to work on this at all so >>> there is >>> > > no code base . But I'd be happy to start contributing with something >>> to >>> > > this engine and I think it would also be very helpful if you will be >>> able >>> > > to contribute to this as well. >>> > > I did get a chance to test the Stanford relation extractor which >>> works >>> > > fine but it's quite limited to a handful of relation types (live_in, >>> > > located_in, org_based_in, work_for). So we would need to train other >>> > models >>> > > if we want to increase the relation type number. >>> > > I also think that the Event Extraction Engine should work in >>> conjunction >>> > > with any coreference and comention engines we have to increase the >>> > relation >>> > > count. >>> > > >>> > > Regards, >>> > > Cristian >>> > > >>> > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody < >>> > > dileepajayak...@gmail.com> wrote: >>> > > >>> > >> Hi Cristian and all, >>> > >> >>> > >> Can I please know the status of this event extraction engine? Event >>> > >> extraction is a really useful feature for semantic enhancements and >>> I am >>> > >> interested in collaborating with this work. >>> > >> Is there any code base you are currently working on for this engine >>> > work? >>> > >> >>> > >> Thanks, >>> > >> Dileepa >>> > >> >>> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca < >>> > >> cristian.petro...@gmail.com> wrote: >>> > >> >>> > >> > Hi Edi, >>> > >> > >>> > >> > Thanks for the info. Stanford Relation Extractor sounds very >>> > >> interesting. >>> > >> > I'll give it a try. >>> > >> > >>> > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid>: >>> > >> > >>> > >> > > Hi Cristian, >>> > >> > > Here are a few more resources on Semantic Role/Relationship >>> > Labeling: >>> > >> > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser, >>> > >> SEMAFOR >>> > >> > > and Stanford Relation Extractor on the code side >>> > >> > > The last one links to a great paper which I believe holds great >>> > >> potential >>> > >> > > for Stanbol: >>> > >> > > A Linear Programming Formulation for Global Inference in Natural >>> > >> Language >>> > >> > > Tasks >>> > >> > > >>> > >> > > | | >>> > >> > > | | | | | | >>> > >> > > | A Linear Programming Formulation for Global Inference in >>> Natural >>> > >> > > Language Tasks Last abstract |Contents |Next abstract A Linear >>> > >> > Programming >>> > >> > > Formulation for Global Inference in Natural Language Tasks | >>> > >> > > | | >>> > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo | >>> > >> > > | | >>> > >> > > | | >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > > Edi >>> > >> > > From: Cristian Petroaca <cristian.petro...@gmail.com> >>> > >> > > To: dev@stanbol.apache.org >>> > >> > > Sent: Sunday, February 15, 2015 6:34 AM >>> > >> > > Subject: Event Extraction Engine >>> > >> > > >>> > >> > > Hi All, >>> > >> > > >>> > >> > > Quite a while ago I started a discussion on this list about >>> Event >>> > >> > > Extraction from text. See >>> > >> > > https://issues.apache.org/jira/browse/STANBOL-1121 >>> > >> > > . >>> > >> > > >>> > >> > > I'd like to get started on the actual work and I have been >>> thinking >>> > >> how >>> > >> > to >>> > >> > > best approach this and there are some things that I would do >>> > >> differently >>> > >> > > than what the JIRA describes.I'd like to get your feedback on >>> it. >>> > >> > > >>> > >> > > Basically the main approach would be: >>> > >> > > >>> > >> > > 1. Detect all NERs and their co-references. >>> > >> > > >>> > >> > > 2. Apply semantic role labeling on the sentences where the above >>> > >> > mentioned >>> > >> > > NERs reside. >>> > >> > > I found some interesting Semantic Role labeling libraries such >>> as >>> > >> > > https://code.google.com/p/mate-tools/ or >>> > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL. >>> > >> > > With this I'll be able to detect the Agent, the Verb (action) >>> and >>> > the >>> > >> > > Patient and Instruments. >>> > >> > > >>> > >> > > This could be a minimal implementation of the engine. After >>> that I >>> > can >>> > >> > > simply create the event data model as described in the JIRA and >>> > >> annotate >>> > >> > > the text. >>> > >> > > But this does not actually detect what kind of event it is or >>> what >>> > are >>> > >> > the >>> > >> > > event specific roles that the entities have in the relation. >>> > >> > > >>> > >> > > For example we can have the sentence "Google buys Yahoo for $100 >>> > >> > million". >>> > >> > > There are a lot more to be said about this sentence than simply >>> that >>> > >> > > "Google" is the agent and "Yahoo" is the Patient. This is >>> actually >>> > an >>> > >> > > acquisition event and "Google" is the buyer and "Yahoo" the >>> bought >>> > >> > entity. >>> > >> > > We also would need to align to a common ontology synonym phrases >>> > such >>> > >> as >>> > >> > > "buy" or "acquire" so that we know that both refer to the same >>> > >> > Acquisition >>> > >> > > event. >>> > >> > > >>> > >> > > Having said that, we would add a new step : >>> > >> > > 3. Try to detect event type and event details. >>> > >> > > >>> > >> > > This can be done by either: >>> > >> > > >>> > >> > > 3.1 Rule based : hand written rules which would map a certain >>> > sentence >>> > >> > > structure, such as the name of the verb and the type of >>> entities as >>> > >> > agent, >>> > >> > > patient to a certain event type. >>> > >> > > This has the benefit of being easy to build but quite >>> inflexible. >>> > >> > > >>> > >> > > 3.2 Statistical based: train a model which would be able to >>> classify >>> > >> an >>> > >> > > event type based on the features of the sentence such as verb >>> type, >>> > >> > entity >>> > >> > > type, role type, etc.. This is the approach described here : >>> > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf. >>> > >> > > This would be quite hard to build but quite flexible. >>> > >> > > >>> > >> > > This 3rd step of detecting event types & details I think would >>> be >>> > most >>> > >> > > efficient for domain specific events. We would have configs with >>> > >> several >>> > >> > > models for several domains available and the user could with >>> use one >>> > >> of >>> > >> > the >>> > >> > > pre-existent models or create a new one. >>> > >> > > >>> > >> > > I don't have any practical experience with training models or >>> text >>> > >> > > classification based on features (but I've been doing a lot of >>> > >> reading on >>> > >> > > it) so I'm not sure exactly how feasible what I said at point >>> no 3 >>> > >> > actually >>> > >> > > is. >>> > >> > > >>> > >> > > Regards, >>> > >> > > Cristian >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > > >>> > >> > >>> > >> >>> > > >>> > > >>> > >>> >> >> >