This is fixed now in the master branch, would you mind to try it again? Jörn
On Wed, Jun 14, 2017 at 4:31 PM, Joern Kottmann <kottm...@gmail.com> wrote: > We have to fix this, William wrote a unit test to reproduce it. > > Jörn > > On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta <damianopo...@gmail.com> > wrote: >> >> Jorn, >> the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries >> (PR #220) but the problem with the postagger serialization still here. i >> can confirm that the last snapshot cannot serialize the postagger using >> the >> cmd tool, >> >> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it >> -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen >> /home/damiano/test.xml -sequenceCodec BIO -resources >> /home/damiano/lavoro/java/Parser/src/main/resources/* >> >> >> *Writing name finder model ... Compressed 885605 parameters to 94030* >> *3451 outcome patterns* >> *Exception in thread "main" java.lang.IllegalStateException: Missing >> serializer for it-pos-maxent.bin* >> * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)* >> * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)* >> * at >> >> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)* >> * at opennlp.tools.cmdline.CLI.main(CLI.java:244)* >> >> I have used this generators.xml file: >> >> *<?xml version="1.0" encoding="UTF-8"?>* >> *<generators>* >> * <cache>* >> * <generators>* >> * <window prevLength="4" nextLength="2">* >> * <tokenclass />* >> * </window>* >> * <window prevLength="4" nextLength="2">* >> * <token />* >> * </window> * >> * <!-- Pos Tagger --> * >> * <window prevLength="4" nextLength="2">* >> * <tokenpos model="it-pos-maxent.bin" />* >> * </window> * >> * <definition />* >> * <prevmap />* >> * <bigram />* >> * <sentence begin="true" end="false" /> * >> * </generators>* >> * </cache>* >> *</generators>* >> >> >> >> >> 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: >> >> > Jorn, >> > At the moment i am using the command tool to train my ner model, but i >> > am >> > getting this error: >> > >> > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang >> > it >> > -model /home/damiano/it-person-perceptron.bin -featuregen >> > /home/damiano/test.xml -sequenceCodec BIO -resources >> > /home/damiano/lavoro/java/Parser/src/main/resources/* >> > >> > *Exception in thread "main" >> > >> > opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError: >> > opennlp.tools.util.InvalidFormatException: No dictionary resource for >> > key: >> > nations.dictionary* >> > at >> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators( >> > TokenNameFinderFactory.java:209) >> > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator( >> > TokenNameFinderFactory.java:150) >> > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241) >> > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run( >> > TokenNameFinderTrainerTool.java:169) >> > at opennlp.tools.cmdline.CLI.main(CLI.java:244) >> > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary >> > resource for key: nations.dict >> > at opennlp.tools.util.featuregen.GeneratorFactory$ >> > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251) >> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( >> > GeneratorFactory.java:732) >> > at opennlp.tools.util.featuregen.GeneratorFactory$ >> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) >> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( >> > GeneratorFactory.java:732) >> > at opennlp.tools.util.featuregen.GeneratorFactory$ >> > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172) >> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( >> > GeneratorFactory.java:732) >> > at opennlp.tools.util.featuregen.GeneratorFactory$ >> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) >> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( >> > GeneratorFactory.java:732) >> > at opennlp.tools.util.featuregen.GeneratorFactory.create( >> > GeneratorFactory.java:782) >> > at >> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators( >> > TokenNameFinderFactory.java:189) >> > ... 4 more >> > >> > As you can see the problem is " >> > No dictionary resource for key: nations.dictionary" because i also need >> > to >> > add a dictionary inside my model. >> > >> > I did these test: >> > >> > *1. used the name nations.dictionary as resource name in my >> > generators.xml >> > and <dictionary dict="nations.dictionary" prefix="nation" />* >> > >> > *2.used the name nations.xml as resource name in my generators.xml and >> > <dictionary dict="nations.xml" prefix="nation" />* >> > >> > *3.used the name nations.dict as resource name in my generators.xml and >> > <dictionary dict="nations.dict" prefix="nation" />* >> > >> > for each test i also have renamed the dictionary fiile name inside my >> > -resource directory. >> > >> > I had no luck. >> > >> > How should i call a dictionary resource? >> > >> > Thanks. >> > >> > >> > >> > 2017-06-07 16:20 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: >> > >> >> Hello Jorn, >> >> i confirm the error. Please take a look at the code below. It is a >> >> working example, you only need to edit the constants GENERATORS, >> >> POSTAGGER >> >> and SERIALIZED. >> >> >> >> >> >> *TEST FILE:* >> >> >> >> package com.damiano.trainer; >> >> >> >> import java.io.BufferedOutputStream; >> >> import java.io.FileInputStream; >> >> import java.io.FileOutputStream; >> >> import java.io.IOException; >> >> import java.io.InputStream; >> >> import java.util.ArrayList; >> >> import java.util.HashMap; >> >> import java.util.List; >> >> import java.util.Map; >> >> import opennlp.tools.ml.perceptron.PerceptronTrainer; >> >> import opennlp.tools.namefind.BioCodec; >> >> import opennlp.tools.namefind.NameFinderME; >> >> import opennlp.tools.namefind.NameSample; >> >> import opennlp.tools.namefind.TokenNameFinderFactory; >> >> import opennlp.tools.namefind.TokenNameFinderModel; >> >> import opennlp.tools.postag.POSModel; >> >> import opennlp.tools.util.ObjectStream; >> >> import opennlp.tools.util.ObjectStreamUtils; >> >> import opennlp.tools.util.TrainingParameters; >> >> import org.apache.commons.io.IOUtils; >> >> >> >> public class Test { >> >> >> >> private final String GENERATORS = "/home/damiano/test.xml"; >> >> private final String POSTAGGER = "/home/damiano/postagger.bin"; >> >> private final String SERIALIZED = "/home/damiano/serialized.bin"; >> >> >> >> public static void main(String[] args) throws IOException { >> >> Test test = new Test(); >> >> } >> >> >> >> public Test() throws IOException { >> >> >> >> List<NameSample> labelled = new ArrayList<>(); >> >> >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> JACOB <END>", false)); >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> JACK <END>", false)); >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> THOMAS <END>", false)); >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> GEORGE <END>", false)); >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> WILLIAM <END>", false)); >> >> labelled.add(NameSample.parse("This is a sentence >> >> <START:person> >> >> JAMES <END>", false)); >> >> >> >> TokenNameFinderFactory factory; >> >> >> >> try (ObjectStream<NameSample> samples = >> >> ObjectStreamUtils.createObjectStream(labelled)) { >> >> //HashMap<String, Object> map = new HashMap<>(); >> >> >> >> try (InputStream in = new FileInputStream(GENERATORS)) { >> >> >> >> // Resources >> >> Map<String, Object> map = new HashMap<>(); >> >> >> >> // Pos Tagger >> >> map.put("postagger.bin", >> >> Test.loadPosTagger(POSTAGGER)); >> >> >> >> >> >> // Factory >> >> factory = new TokenNameFinderFactory( >> >> IOUtils.toByteArray(in), >> >> map, >> >> new BioCodec() >> >> ); >> >> >> >> try { >> >> >> >> TrainingParameters mlParams = new >> >> TrainingParameters(); >> >> mlParams.put(TrainingParameters.ALGORITHM_PARAM, >> >> PerceptronTrainer.PERCEPTRON_VALUE); >> >> mlParams.put(TrainingParameters.ITERATIONS_PARAM, >> >> Integer.toString(300)); >> >> mlParams.put(TrainingParameters.CUTOFF_PARAM, >> >> Integer.toString(0)); >> >> >> >> TokenNameFinderModel model = >> >> NameFinderME.train("it", >> >> "person", samples, mlParams, factory); >> >> >> >> try (BufferedOutputStream modelOut = new >> >> BufferedOutputStream(new FileOutputStream(SERIALIZED))) { >> >> model.serialize(modelOut); >> >> } >> >> >> >> } catch (Exception ex) { >> >> ex.printStackTrace(); >> >> } >> >> >> >> } >> >> } >> >> } >> >> >> >> public static POSModel loadPosTagger (String modelName) { >> >> >> >> try (InputStream modelIn = new FileInputStream(modelName)) { >> >> POSModel model = new POSModel(modelIn); >> >> return model; >> >> } >> >> catch (Exception ex) { ex.printStackTrace(); } >> >> >> >> return null; >> >> } >> >> } >> >> >> >> *GENERATORS:* >> >> >> >> <?xml version="1.0" encoding="UTF-8"?> >> >> <generators> >> >> <cache> >> >> <generators> >> >> <window prevLength="4" nextLength="2"> >> >> <tokenclass /> >> >> </window> >> >> <window prevLength="4" nextLength="2"> >> >> <token /> >> >> </window> >> >> <!-- Pos Tagger --> >> >> <window prevLength="4" nextLength="2"> >> >> <tokenpos model="postagger.bin" /> >> >> </window> >> >> <definition /> >> >> <prevmap /> >> >> <bigram /> >> >> <sentence begin="true" end="false" /> >> >> </generators> >> >> </cache> >> >> </generators> >> >> >> >> >> >> *OUTPUT (with error):* >> >> >> >> >> >> *Indexing events using cutoff of 0 Computing event counts... done. 30 >> >> events Indexing... done.Collecting events... Done >> >> indexing.Incorporating >> >> indexed data for training... done. Number of Event Tokens: 30 >> >> Number of >> >> Outcomes: 2 Number of Predicates: 144Computing model >> >> parameters...Performing 300 iterations. 1: . (27/30) 0.9 2: . >> >> (30/30) >> >> 1.0 3: . (30/30) 1.0 4: . (30/30) 1.0 5: . (30/30) 1.0Stopping: >> >> change in training set accuracy less than 1.0E-5Stats: (30/30) >> >> 1.0...done.Compressed 144 parameters to 621 outcome >> >> patternsjava.lang.IllegalStateException: Missing serializer for >> >> postagger.bin at >> >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at >> >> com.damiano.trainer.Test.<init>(Test.java:75) at >> >> com.damiano.trainer.Test.main(Test.java:31)* >> >> >> >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: >> >> >> >>> Hmm let me try again, yes i copied it badly, i think the names are >> >>> correct, i will give you a working example. >> >>> >> >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: >> >>> >> >>>> Ok, but are you sure you used matching names? The exception states >> >>>> it-pos-maxent.bin, >> >>>> which object did you map to it? >> >>>> >> >>>> Jörn >> >>>> >> >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta >> >>>> <damianopo...@gmail.com> >> >>>> wrote: >> >>>> >> >>>> > Hi Jorn! Yes >> >>>> > >> >>>> > <dependency> >> >>>> > <groupId>org.apache.opennlp</groupId> >> >>>> > <artifactId>opennlp-tools</artifactId> >> >>>> > <version>1.8.0</version> >> >>>> > </dependency> >> >>>> > >> >>>> > Do i need others dependencies too? >> >>>> > >> >>>> > >> >>>> > >> >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: >> >>>> > >> >>>> > > This should be working. Did you test with 1.8.0? >> >>>> > > >> >>>> > > Jörn >> >>>> > > >> >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta < >> >>>> damianopo...@gmail.com> >> >>>> > > wrote: >> >>>> > > >> >>>> > > > Hello, >> >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml >> >>>> > > > >> >>>> > > > <tokenpos model="postagger.bin" /> >> >>>> > > > >> >>>> > > > during the training i add this model in the resources doing: >> >>>> > > > >> >>>> > > > HashMap<String, Object> map = new HashMap<>(); >> >>>> > > > map.put("postagger.bin", myPostaggerModel); >> >>>> > > > >> >>>> > > > >> >>>> > > > factory = new TokenNameFinderFactory( >> >>>> > > > IOUtils.toByteArray(in), >> >>>> > > > map, >> >>>> > > > new BioCodec() >> >>>> > > > ); >> >>>> > > > >> >>>> > > > I get this error: >> >>>> > > > >> >>>> > > > java.lang.IllegalStateException: Missing serializer for >> >>>> > > it-pos-maxent.bin >> >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java: >> >>>> 589) >> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187) >> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44) >> >>>> > > > 2017-06-05 15:37:35 INFO Trainer:192 - >> >>>> java.lang.IllegalStateExceptio >> >>>> > n: >> >>>> > > > Missing serializer for postagger.bin >> >>>> > > > >> >>>> > > > Do i have to change the extension of the file? >> >>>> > > > >> >>>> > > > Thanks >> >>>> > > > >> >>>> > > >> >>>> > >> >>>> >> >>> >> >>> >> >> >> > > >