We have to fix this, William wrote a unit test to reproduce it. Jörn
On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta <damianopo...@gmail.com> wrote: > Jorn, > the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries > (PR #220) but the problem with the postagger serialization still here. i > can confirm that the last snapshot cannot serialize the postagger using the > cmd tool, > > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it > -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen > /home/damiano/test.xml -sequenceCodec BIO -resources > /home/damiano/lavoro/java/Parser/src/main/resources/* > > > *Writing name finder model ... Compressed 885605 parameters to 94030* > *3451 outcome patterns* > *Exception in thread "main" java.lang.IllegalStateException: Missing > serializer for it-pos-maxent.bin* > * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)* > * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)* > * at > opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run( > TokenNameFinderTrainerTool.java:188)* > * at opennlp.tools.cmdline.CLI.main(CLI.java:244)* > > I have used this generators.xml file: > > *<?xml version="1.0" encoding="UTF-8"?>* > *<generators>* > * <cache>* > * <generators>* > * <window prevLength="4" nextLength="2">* > * <tokenclass />* > * </window>* > * <window prevLength="4" nextLength="2">* > * <token />* > * </window> * > * <!-- Pos Tagger --> * > * <window prevLength="4" nextLength="2">* > * <tokenpos model="it-pos-maxent.bin" />* > * </window> * > * <definition />* > * <prevmap />* > * <bigram />* > * <sentence begin="true" end="false" /> * > * </generators>* > * </cache>* > *</generators>* > > > > > 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > > > Jorn, > > At the moment i am using the command tool to train my ner model, but i am > > getting this error: > > > > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it > > -model /home/damiano/it-person-perceptron.bin -featuregen > > /home/damiano/test.xml -sequenceCodec BIO -resources > > /home/damiano/lavoro/java/Parser/src/main/resources/* > > > > *Exception in thread "main" > > opennlp.tools.namefind.TokenNameFinderModel$ > FeatureGeneratorCreationError: > > opennlp.tools.util.InvalidFormatException: No dictionary resource for > key: > > nations.dictionary* > > at opennlp.tools.namefind.TokenNameFinderFactory. > createFeatureGenerators( > > TokenNameFinderFactory.java:209) > > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator( > > TokenNameFinderFactory.java:150) > > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241) > > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run( > > TokenNameFinderTrainerTool.java:169) > > at opennlp.tools.cmdline.CLI.main(CLI.java:244) > > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary > > resource for key: nations.dict > > at opennlp.tools.util.featuregen.GeneratorFactory$ > > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251) > > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > > GeneratorFactory.java:732) > > at opennlp.tools.util.featuregen.GeneratorFactory$ > > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) > > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > > GeneratorFactory.java:732) > > at opennlp.tools.util.featuregen.GeneratorFactory$ > > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172) > > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > > GeneratorFactory.java:732) > > at opennlp.tools.util.featuregen.GeneratorFactory$ > > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130) > > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator( > > GeneratorFactory.java:732) > > at opennlp.tools.util.featuregen.GeneratorFactory.create( > > GeneratorFactory.java:782) > > at opennlp.tools.namefind.TokenNameFinderFactory. > createFeatureGenerators( > > TokenNameFinderFactory.java:189) > > ... 4 more > > > > As you can see the problem is " > > No dictionary resource for key: nations.dictionary" because i also need > to > > add a dictionary inside my model. > > > > I did these test: > > > > *1. used the name nations.dictionary as resource name in my > generators.xml > > and <dictionary dict="nations.dictionary" prefix="nation" />* > > > > *2.used the name nations.xml as resource name in my generators.xml and > > <dictionary dict="nations.xml" prefix="nation" />* > > > > *3.used the name nations.dict as resource name in my generators.xml and > > <dictionary dict="nations.dict" prefix="nation" />* > > > > for each test i also have renamed the dictionary fiile name inside my > > -resource directory. > > > > I had no luck. > > > > How should i call a dictionary resource? > > > > Thanks. > > > > > > > > 2017-06-07 16:20 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > > > >> Hello Jorn, > >> i confirm the error. Please take a look at the code below. It is a > >> working example, you only need to edit the constants GENERATORS, > POSTAGGER > >> and SERIALIZED. > >> > >> > >> *TEST FILE:* > >> > >> package com.damiano.trainer; > >> > >> import java.io.BufferedOutputStream; > >> import java.io.FileInputStream; > >> import java.io.FileOutputStream; > >> import java.io.IOException; > >> import java.io.InputStream; > >> import java.util.ArrayList; > >> import java.util.HashMap; > >> import java.util.List; > >> import java.util.Map; > >> import opennlp.tools.ml.perceptron.PerceptronTrainer; > >> import opennlp.tools.namefind.BioCodec; > >> import opennlp.tools.namefind.NameFinderME; > >> import opennlp.tools.namefind.NameSample; > >> import opennlp.tools.namefind.TokenNameFinderFactory; > >> import opennlp.tools.namefind.TokenNameFinderModel; > >> import opennlp.tools.postag.POSModel; > >> import opennlp.tools.util.ObjectStream; > >> import opennlp.tools.util.ObjectStreamUtils; > >> import opennlp.tools.util.TrainingParameters; > >> import org.apache.commons.io.IOUtils; > >> > >> public class Test { > >> > >> private final String GENERATORS = "/home/damiano/test.xml"; > >> private final String POSTAGGER = "/home/damiano/postagger.bin"; > >> private final String SERIALIZED = "/home/damiano/serialized.bin"; > >> > >> public static void main(String[] args) throws IOException { > >> Test test = new Test(); > >> } > >> > >> public Test() throws IOException { > >> > >> List<NameSample> labelled = new ArrayList<>(); > >> > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> JACOB <END>", false)); > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> JACK <END>", false)); > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> THOMAS <END>", false)); > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> GEORGE <END>", false)); > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> WILLIAM <END>", false)); > >> labelled.add(NameSample.parse("This is a sentence > <START:person> > >> JAMES <END>", false)); > >> > >> TokenNameFinderFactory factory; > >> > >> try (ObjectStream<NameSample> samples = > >> ObjectStreamUtils.createObjectStream(labelled)) { > >> //HashMap<String, Object> map = new HashMap<>(); > >> > >> try (InputStream in = new FileInputStream(GENERATORS)) { > >> > >> // Resources > >> Map<String, Object> map = new HashMap<>(); > >> > >> // Pos Tagger > >> map.put("postagger.bin", Test.loadPosTagger(POSTAGGER)) > ; > >> > >> > >> // Factory > >> factory = new TokenNameFinderFactory( > >> IOUtils.toByteArray(in), > >> map, > >> new BioCodec() > >> ); > >> > >> try { > >> > >> TrainingParameters mlParams = new > >> TrainingParameters(); > >> mlParams.put(TrainingParameters.ALGORITHM_PARAM, > >> PerceptronTrainer.PERCEPTRON_VALUE); > >> mlParams.put(TrainingParameters.ITERATIONS_PARAM, > >> Integer.toString(300)); > >> mlParams.put(TrainingParameters.CUTOFF_PARAM, > >> Integer.toString(0)); > >> > >> TokenNameFinderModel model = > NameFinderME.train("it", > >> "person", samples, mlParams, factory); > >> > >> try (BufferedOutputStream modelOut = new > >> BufferedOutputStream(new FileOutputStream(SERIALIZED))) { > >> model.serialize(modelOut); > >> } > >> > >> } catch (Exception ex) { > >> ex.printStackTrace(); > >> } > >> > >> } > >> } > >> } > >> > >> public static POSModel loadPosTagger (String modelName) { > >> > >> try (InputStream modelIn = new FileInputStream(modelName)) { > >> POSModel model = new POSModel(modelIn); > >> return model; > >> } > >> catch (Exception ex) { ex.printStackTrace(); } > >> > >> return null; > >> } > >> } > >> > >> *GENERATORS:* > >> > >> <?xml version="1.0" encoding="UTF-8"?> > >> <generators> > >> <cache> > >> <generators> > >> <window prevLength="4" nextLength="2"> > >> <tokenclass /> > >> </window> > >> <window prevLength="4" nextLength="2"> > >> <token /> > >> </window> > >> <!-- Pos Tagger --> > >> <window prevLength="4" nextLength="2"> > >> <tokenpos model="postagger.bin" /> > >> </window> > >> <definition /> > >> <prevmap /> > >> <bigram /> > >> <sentence begin="true" end="false" /> > >> </generators> > >> </cache> > >> </generators> > >> > >> > >> *OUTPUT (with error):* > >> > >> > >> *Indexing events using cutoff of 0 Computing event counts... done. 30 > >> events Indexing... done.Collecting events... Done > indexing.Incorporating > >> indexed data for training... done. Number of Event Tokens: 30 > Number of > >> Outcomes: 2 Number of Predicates: 144Computing model > >> parameters...Performing 300 iterations. 1: . (27/30) 0.9 2: . > (30/30) > >> 1.0 3: . (30/30) 1.0 4: . (30/30) 1.0 5: . (30/30) 1.0Stopping: > >> change in training set accuracy less than 1.0E-5Stats: (30/30) > >> 1.0...done.Compressed 144 parameters to 621 outcome > >> patternsjava.lang.IllegalStateException: Missing serializer for > >> postagger.bin at > >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at > >> com.damiano.trainer.Test.<init>(Test.java:75) at > >> com.damiano.trainer.Test.main(Test.java:31)* > >> > >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <damianopo...@gmail.com>: > >> > >>> Hmm let me try again, yes i copied it badly, i think the names are > >>> correct, i will give you a working example. > >>> > >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > >>> > >>>> Ok, but are you sure you used matching names? The exception states > >>>> it-pos-maxent.bin, > >>>> which object did you map to it? > >>>> > >>>> Jörn > >>>> > >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta <damianopo...@gmail.com > > > >>>> wrote: > >>>> > >>>> > Hi Jorn! Yes > >>>> > > >>>> > <dependency> > >>>> > <groupId>org.apache.opennlp</groupId> > >>>> > <artifactId>opennlp-tools</artifactId> > >>>> > <version>1.8.0</version> > >>>> > </dependency> > >>>> > > >>>> > Do i need others dependencies too? > >>>> > > >>>> > > >>>> > > >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>: > >>>> > > >>>> > > This should be working. Did you test with 1.8.0? > >>>> > > > >>>> > > Jörn > >>>> > > > >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta < > >>>> damianopo...@gmail.com> > >>>> > > wrote: > >>>> > > > >>>> > > > Hello, > >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml > >>>> > > > > >>>> > > > <tokenpos model="postagger.bin" /> > >>>> > > > > >>>> > > > during the training i add this model in the resources doing: > >>>> > > > > >>>> > > > HashMap<String, Object> map = new HashMap<>(); > >>>> > > > map.put("postagger.bin", myPostaggerModel); > >>>> > > > > >>>> > > > > >>>> > > > factory = new TokenNameFinderFactory( > >>>> > > > IOUtils.toByteArray(in), > >>>> > > > map, > >>>> > > > new BioCodec() > >>>> > > > ); > >>>> > > > > >>>> > > > I get this error: > >>>> > > > > >>>> > > > java.lang.IllegalStateException: Missing serializer for > >>>> > > it-pos-maxent.bin > >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java: > >>>> 589) > >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187) > >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44) > >>>> > > > 2017-06-05 15:37:35 INFO Trainer:192 - > >>>> java.lang.IllegalStateExceptio > >>>> > n: > >>>> > > > Missing serializer for postagger.bin > >>>> > > > > >>>> > > > Do i have to change the extension of the file? > >>>> > > > > >>>> > > > Thanks > >>>> > > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > >