Re: How to train and create Coreference models
On 06/14/2012 09:06 PM, Carlos Scheidecker wrote: Also, is the format for NERs on 1.5x different than the one on 1.4x and under? Seems to me that the coreference models are different. Is it because the NERs code has been updated and the Coreference still not? First part of my answer. Yes we updated all the components expect the coreferencer to use a model package, this package includes everything that is needed to run a component from a model. No further configuration is needed. Evaluation support is also on our wish list. You can use the 1.5 NER models, but the coreferencer models are the same for the 1.5 release as they have been for 1.4. The coref component was always a bit special because we do not really have data to train it. I got a copy of MUC 6 / 7 data and added formats support for that to it, but there are still some minor issues. It would be nice to get that fixed! Answer about how the training is supposed to work will follow. Jörn
Re: Syntatic roles with OPENNLP
On 06/14/2012 10:28 PM, Carlos Scheidecker wrote: What if you need to parse/divide a clause/phrase into syntatic roles? For instance, Subject, object, preposition, direct object, indirect object. Is there any library or system that would do that with OpenNLP? Has anyone performed Syntatic role classification/extraction using OpenNLP before? No, that is not possible with OpenNLP, but we are open for any contributions. Jörn
Syntatic roles with OPENNLP
Hello all, What if you need to parse/divide a clause/phrase into syntatic roles? For instance, Subject, object, preposition, direct object, indirect object. Is there any library or system that would do that with OpenNLP? Has anyone performed Syntatic role classification/extraction using OpenNLP before? Thanks, Carlos.
How to train and create Coreference models
Jörn et all, Allow me to abuse a little further. How do I go to train Coreference models? Any hints on the code for achieving that and creating the files? Also, is the format for NERs on 1.5x different than the one on 1.4x and under? Seems to me that the coreference models are different. Is it because the NERs code has been updated and the Coreference still not? Thanks again, Carlos.
Re: How to work with Coreference resolutions
Jörn, Great, that helps quite a lot! It is similar with the main method of that class but your explanations and the trick of integrating the NERs are the real silver bullet of your post here. Thanks a bunch. I will play with that at night. What I want is to define a few IE algorithms from that. Is there any references of Information Extraction using OpenNLP that you would recommend as well? Thanks, Carlos. On Thu, Jun 14, 2012 at 9:14 AM, Jörn Kottmann wrote: > Hello, > > the input for the coreference component needs to be preprocessed, > with the sentence detector, tokenizer, parser and name finders. > > You can do this via API and our documentation provides sample code for > each of these steps. > > The only tricky part is the to get the named entities into the parse tree. > Here is a sample: > Parse parse; // returned from parser > Span personEntites[]; // returned from person name model > > Parse.addNames("person", personEntites[fi], parse.getTagNodes()); > > After this the person names are inserted into the parse tree, you need > to repeat this step for every entity type you would like to reference. The > "person" > tags are currently hard coded. You can find a list in > TreebankNameFinder.NAME_TYPES > (I believe thats a trunk only class). > > Before you start with the rest you should download all the coreferencer > models for 1.4 > into one directory, similar to the structure on the sever. > > Now we are coming to the coreference resolution code: > Linker treebankLinker = new TreebankLinker("/home/joern/**corefmodel/", > LinkerMode.TEST); > > This will create the linker for you. > > First all the mentions need to be recognized and afterward they are linked > together. > For every sentence you do this: > Parse p = ...; // contains a parse of a sentence with names > Mention[] extents = treebankLinker.**getMentionFinder().**getMentions(new > DefaultParse(p,sentenceNumber)**); > for (int ei=0,en=extents.length;ei if (extents[ei].getParse() == null) { >Parse snp = new Parse(p.getText(),extents[ei].**getSpan(),"NML",1.0,0); >p.insert(snp); >extents[ei].setParse(new DefaultParse(snp, sentenceNumber)); > } > } > sentenceNumber++; > > The result are the mentions per sentence. All these mention objects should > be copied into a single list, > e.g. via document.addAll(extents) (document is of type List). > > Now the mentions of one document can be linked together: > DiscourseEntity[] entities = treebankLinker.getEntities(**document.toArray(new > Mention[document.size()])); > > The entities array now contains the various detected and linked entities, > usually you want to filter out entities > which just have a single mention. The DiscourseEntity groups mentions > together, a mention must not be an > entity, other noun phrases are valid mentions as well. > > Hope that helps, > Jörn > > > > On 06/13/2012 07:41 PM, Carlos Scheidecker wrote: > >> Jörn, >> >> I just want to know how it works for now. I've following the one from >> StanfordNLP as well. >> >> Basically, I want to first know if I just pass raw test to it or if I have >> to tag that first. Looks like I need to do POS tag first. >> >> I want to be able to pass a text and get the references as object lists >> from the API. >> >> So I can fetch the relations. >> >> I still need to take some time here and read more the source code unless >> you have some pointers. >> >> Thanks, >> >> Carlos. >> >> >> >> On Wed, Jun 13, 2012 at 11:23 AM, Jörn Kottmann >> wrote: >> >> On 06/13/2012 07:07 PM, Carlos Scheidecker wrote: >>> >>> Thanks. So for now we can only use the models from 1.4. I saw that a training class was added recently. How do you use that? Thats still work in progress, on which data do you want to train? >>> >>> You need to produce data in a certain format, there should be a sample in >>> the test folder. >>> Its basically penn treebank style plus some nodes to label the mentions >>> in the tree. >>> >>> The parse trees of a document are grouped and send document wise >>> to the trainer via a stream. After this is done a new model will be >>> trained. >>> >>> The OpenNLP corferencer works currently only on noun phrases, other >>> mentions >>> like verbs will not be resolved (in case you wanna train on OntoNotes). >>> >>> Jörn >>> >>> >>> >>> >
Re: How to work with Coreference resolutions
Hello, the input for the coreference component needs to be preprocessed, with the sentence detector, tokenizer, parser and name finders. You can do this via API and our documentation provides sample code for each of these steps. The only tricky part is the to get the named entities into the parse tree. Here is a sample: Parse parse; // returned from parser Span personEntites[]; // returned from person name model Parse.addNames("person", personEntites[fi], parse.getTagNodes()); After this the person names are inserted into the parse tree, you need to repeat this step for every entity type you would like to reference. The "person" tags are currently hard coded. You can find a list in TreebankNameFinder.NAME_TYPES (I believe thats a trunk only class). Before you start with the rest you should download all the coreferencer models for 1.4 into one directory, similar to the structure on the sever. Now we are coming to the coreference resolution code: Linker treebankLinker = new TreebankLinker("/home/joern/corefmodel/", LinkerMode.TEST); This will create the linker for you. First all the mentions need to be recognized and afterward they are linked together. For every sentence you do this: Parse p = ...; // contains a parse of a sentence with names Mention[] extents = treebankLinker.getMentionFinder().getMentions(new DefaultParse(p,sentenceNumber)); for (int ei=0,en=extents.length;eiThe result are the mentions per sentence. All these mention objects should be copied into a single list, e.g. via document.addAll(extents) (document is of type List). Now the mentions of one document can be linked together: DiscourseEntity[] entities = treebankLinker.getEntities(document.toArray(new Mention[document.size()])); The entities array now contains the various detected and linked entities, usually you want to filter out entities which just have a single mention. The DiscourseEntity groups mentions together, a mention must not be an entity, other noun phrases are valid mentions as well. Hope that helps, Jörn On 06/13/2012 07:41 PM, Carlos Scheidecker wrote: Jörn, I just want to know how it works for now. I've following the one from StanfordNLP as well. Basically, I want to first know if I just pass raw test to it or if I have to tag that first. Looks like I need to do POS tag first. I want to be able to pass a text and get the references as object lists from the API. So I can fetch the relations. I still need to take some time here and read more the source code unless you have some pointers. Thanks, Carlos. On Wed, Jun 13, 2012 at 11:23 AM, Jörn Kottmann wrote: On 06/13/2012 07:07 PM, Carlos Scheidecker wrote: Thanks. So for now we can only use the models from 1.4. I saw that a training class was added recently. How do you use that? Thats still work in progress, on which data do you want to train? You need to produce data in a certain format, there should be a sample in the test folder. Its basically penn treebank style plus some nodes to label the mentions in the tree. The parse trees of a document are grouped and send document wise to the trainer via a stream. After this is done a new model will be trained. The OpenNLP corferencer works currently only on noun phrases, other mentions like verbs will not be resolved (in case you wanna train on OntoNotes). Jörn