Hi Cristian, Interesting ideas. Let me do some background reading on this, so I can also participate in the discussion better.
Thanks, Dileepa On Wed, Sep 9, 2015 at 3:17 PM, Cristian Petroaca < cristian.petro...@gmail.com> wrote: > Another approach to this would be to use a semantic role labeling tool [1] > to determine the type of relation between the subject and object. > > Or we could use Word Sense Disambiguation to determine the wordnet class of > the verb (this way we have a standard relation definition) and based on > what relation type it is we can search for the subject and object using > dependency tree parsing in Stanford NLP. > > These 2 options ensure that we can have a much bigger recall but I'm not > sure about the precision... > > So I think we'll need to first settle on the method of implementing this > engine before starting anything. > > [1] http://cogcomp.cs.illinois.edu/page/demo_view/srl > > On Tue, Sep 8, 2015 at 11:45 AM, Cristian Petroaca < > cristian.petro...@gmail.com> wrote: > > > Hi Dileepa, > > > > Unfortunately I did not have the time to work on this at all so there is > > no code base . But I'd be happy to start contributing with something to > > this engine and I think it would also be very helpful if you will be able > > to contribute to this as well. > > I did get a chance to test the Stanford relation extractor which works > > fine but it's quite limited to a handful of relation types (live_in, > > located_in, org_based_in, work_for). So we would need to train other > models > > if we want to increase the relation type number. > > I also think that the Event Extraction Engine should work in conjunction > > with any coreference and comention engines we have to increase the > relation > > count. > > > > Regards, > > Cristian > > > > On Tue, Sep 8, 2015 at 11:19 AM, Dileepa Jayakody < > > dileepajayak...@gmail.com> wrote: > > > >> Hi Cristian and all, > >> > >> Can I please know the status of this event extraction engine? Event > >> extraction is a really useful feature for semantic enhancements and I am > >> interested in collaborating with this work. > >> Is there any code base you are currently working on for this engine > work? > >> > >> Thanks, > >> Dileepa > >> > >> On Tue, Feb 17, 2015 at 9:10 PM, Cristian Petroaca < > >> cristian.petro...@gmail.com> wrote: > >> > >> > Hi Edi, > >> > > >> > Thanks for the info. Stanford Relation Extractor sounds very > >> interesting. > >> > I'll give it a try. > >> > > >> > 2015-02-17 17:00 GMT+02:00 Edi Bice <edi_b...@yahoo.com.invalid>: > >> > > >> > > Hi Cristian, > >> > > Here are a few more resources on Semantic Role/Relationship > Labeling: > >> > > 1. FrameNet, VerbNet and WordNet on the data side2. Shalmaneser, > >> SEMAFOR > >> > > and Stanford Relation Extractor on the code side > >> > > The last one links to a great paper which I believe holds great > >> potential > >> > > for Stanbol: > >> > > A Linear Programming Formulation for Global Inference in Natural > >> Language > >> > > Tasks > >> > > > >> > > | | > >> > > | | | | | | > >> > > | A Linear Programming Formulation for Global Inference in Natural > >> > > Language Tasks Last abstract |Contents |Next abstract A Linear > >> > Programming > >> > > Formulation for Global Inference in Natural Language Tasks | > >> > > | | > >> > > | View on www.cnts.ua.ac.be | Preview by Yahoo | > >> > > | | > >> > > | | > >> > > > >> > > > >> > > > >> > > Edi > >> > > From: Cristian Petroaca <cristian.petro...@gmail.com> > >> > > To: dev@stanbol.apache.org > >> > > Sent: Sunday, February 15, 2015 6:34 AM > >> > > Subject: Event Extraction Engine > >> > > > >> > > Hi All, > >> > > > >> > > Quite a while ago I started a discussion on this list about Event > >> > > Extraction from text. See > >> > > https://issues.apache.org/jira/browse/STANBOL-1121 > >> > > . > >> > > > >> > > I'd like to get started on the actual work and I have been thinking > >> how > >> > to > >> > > best approach this and there are some things that I would do > >> differently > >> > > than what the JIRA describes.I'd like to get your feedback on it. > >> > > > >> > > Basically the main approach would be: > >> > > > >> > > 1. Detect all NERs and their co-references. > >> > > > >> > > 2. Apply semantic role labeling on the sentences where the above > >> > mentioned > >> > > NERs reside. > >> > > I found some interesting Semantic Role labeling libraries such as > >> > > https://code.google.com/p/mate-tools/ or > >> > > http://cogcomp.cs.illinois.edu/page/software_view/SRL. > >> > > With this I'll be able to detect the Agent, the Verb (action) and > the > >> > > Patient and Instruments. > >> > > > >> > > This could be a minimal implementation of the engine. After that I > can > >> > > simply create the event data model as described in the JIRA and > >> annotate > >> > > the text. > >> > > But this does not actually detect what kind of event it is or what > are > >> > the > >> > > event specific roles that the entities have in the relation. > >> > > > >> > > For example we can have the sentence "Google buys Yahoo for $100 > >> > million". > >> > > There are a lot more to be said about this sentence than simply that > >> > > "Google" is the agent and "Yahoo" is the Patient. This is actually > an > >> > > acquisition event and "Google" is the buyer and "Yahoo" the bought > >> > entity. > >> > > We also would need to align to a common ontology synonym phrases > such > >> as > >> > > "buy" or "acquire" so that we know that both refer to the same > >> > Acquisition > >> > > event. > >> > > > >> > > Having said that, we would add a new step : > >> > > 3. Try to detect event type and event details. > >> > > > >> > > This can be done by either: > >> > > > >> > > 3.1 Rule based : hand written rules which would map a certain > sentence > >> > > structure, such as the name of the verb and the type of entities as > >> > agent, > >> > > patient to a certain event type. > >> > > This has the benefit of being easy to build but quite inflexible. > >> > > > >> > > 3.2 Statistical based: train a model which would be able to classify > >> an > >> > > event type based on the features of the sentence such as verb type, > >> > entity > >> > > type, role type, etc.. This is the approach described here : > >> > > http://web.stanford.edu/~jurafsky/mintz.pdf. > >> > > This would be quite hard to build but quite flexible. > >> > > > >> > > This 3rd step of detecting event types & details I think would be > most > >> > > efficient for domain specific events. We would have configs with > >> several > >> > > models for several domains available and the user could with use one > >> of > >> > the > >> > > pre-existent models or create a new one. > >> > > > >> > > I don't have any practical experience with training models or text > >> > > classification based on features (but I've been doing a lot of > >> reading on > >> > > it) so I'm not sure exactly how feasible what I said at point no 3 > >> > actually > >> > > is. > >> > > > >> > > Regards, > >> > > Cristian > >> > > > >> > > > >> > > > >> > > > >> > > >> > > > > >