This is fixed now in the master branch, would you mind to try it again?

Jörn

On Wed, Jun 14, 2017 at 4:31 PM, Joern Kottmann <kottm...@gmail.com> wrote:
> We have to fix this, William wrote a unit test to reproduce it.
>
> Jörn
>
> On Fri, Jun 9, 2017 at 4:31 PM, Damiano Porta <damianopo...@gmail.com>
> wrote:
>>
>> Jorn,
>> the last snapshot 1.8.1-snapshot has fixed the problem with dictionaries
>> (PR #220) but the problem with the postagger serialization still here. i
>> can confirm that the last snapshot cannot serialize the postagger using
>> the
>> cmd tool,
>>
>> *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang it
>> -model /home/damiano/it-tuoagente-perceptron-custom.bin -featuregen
>> /home/damiano/test.xml -sequenceCodec BIO -resources
>> /home/damiano/lavoro/java/Parser/src/main/resources/*
>>
>>
>> *Writing name finder model ... Compressed 885605 parameters to 94030*
>> *3451 outcome patterns*
>> *Exception in thread "main" java.lang.IllegalStateException: Missing
>> serializer for it-pos-maxent.bin*
>> * at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:592)*
>> * at opennlp.tools.cmdline.CmdLineUtil.writeModel(CmdLineUtil.java:182)*
>> * at
>>
>> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:188)*
>> * at opennlp.tools.cmdline.CLI.main(CLI.java:244)*
>>
>> I have used this generators.xml file:
>>
>> *<?xml version="1.0" encoding="UTF-8"?>*
>> *<generators>*
>> *    <cache>*
>> *        <generators>*
>> *            <window prevLength="4" nextLength="2">*
>> *                <tokenclass />*
>> *            </window>*
>> *            <window prevLength="4" nextLength="2">*
>> *                <token />*
>> *            </window> *
>> *            <!-- Pos Tagger -->                *
>> *            <window prevLength="4" nextLength="2">*
>> *                <tokenpos model="it-pos-maxent.bin" />*
>> *            </window>       *
>> *            <definition />*
>> *            <prevmap />*
>> *            <bigram />*
>> *            <sentence begin="true" end="false" />          *
>> *        </generators>*
>> *    </cache>*
>> *</generators>*
>>
>>
>>
>>
>> 2017-06-09 15:17 GMT+02:00 Damiano Porta <damianopo...@gmail.com>:
>>
>> > Jorn,
>> > At the moment i am using the command tool to train my ner model, but i
>> > am
>> > getting this error:
>> >
>> > *opennlp TokenNameFinderTrainer -data /home/damiano/corpus.train -lang
>> > it
>> > -model /home/damiano/it-person-perceptron.bin -featuregen
>> > /home/damiano/test.xml -sequenceCodec BIO -resources
>> > /home/damiano/lavoro/java/Parser/src/main/resources/*
>> >
>> > *Exception in thread "main"
>> >
>> > opennlp.tools.namefind.TokenNameFinderModel$FeatureGeneratorCreationError:
>> > opennlp.tools.util.InvalidFormatException: No dictionary resource for
>> > key:
>> > nations.dictionary*
>> > at
>> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(
>> > TokenNameFinderFactory.java:209)
>> > at opennlp.tools.namefind.TokenNameFinderFactory.createContextGenerator(
>> > TokenNameFinderFactory.java:150)
>> > at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:241)
>> > at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(
>> > TokenNameFinderTrainerTool.java:169)
>> > at opennlp.tools.cmdline.CLI.main(CLI.java:244)
>> > Caused by: opennlp.tools.util.InvalidFormatException: No dictionary
>> > resource for key: nations.dict
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > DictionaryFeatureGeneratorFactory.create(GeneratorFactory.java:251)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > CachedFeatureGeneratorFactory.create(GeneratorFactory.java:172)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory$
>> > AggregatedFeatureGeneratorFactory.create(GeneratorFactory.java:130)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.createGenerator(
>> > GeneratorFactory.java:732)
>> > at opennlp.tools.util.featuregen.GeneratorFactory.create(
>> > GeneratorFactory.java:782)
>> > at
>> > opennlp.tools.namefind.TokenNameFinderFactory.createFeatureGenerators(
>> > TokenNameFinderFactory.java:189)
>> > ... 4 more
>> >
>> > As you can see the problem is "
>> > No dictionary resource for key: nations.dictionary" because i also need
>> > to
>> > add a dictionary inside my model.
>> >
>> > I did these test:
>> >
>> > *1. used the name nations.dictionary as resource name in my
>> > generators.xml
>> > and <dictionary dict="nations.dictionary" prefix="nation" />*
>> >
>> > *2.used the name nations.xml as resource name in my generators.xml and
>> > <dictionary dict="nations.xml" prefix="nation" />*
>> >
>> > *3.used the name nations.dict as resource name in my generators.xml and
>> > <dictionary dict="nations.dict" prefix="nation" />*
>> >
>> > for each test i also have renamed the dictionary fiile name inside my
>> > -resource directory.
>> >
>> > I had no luck.
>> >
>> > How should i call a dictionary resource?
>> >
>> > Thanks.
>> >
>> >
>> >
>> > 2017-06-07 16:20 GMT+02:00 Damiano Porta <damianopo...@gmail.com>:
>> >
>> >> Hello Jorn,
>> >> i confirm the error. Please take a look at the code below. It is a
>> >> working example, you only need to edit the constants GENERATORS,
>> >> POSTAGGER
>> >> and SERIALIZED.
>> >>
>> >>
>> >> *TEST FILE:*
>> >>
>> >> package com.damiano.trainer;
>> >>
>> >> import java.io.BufferedOutputStream;
>> >> import java.io.FileInputStream;
>> >> import java.io.FileOutputStream;
>> >> import java.io.IOException;
>> >> import java.io.InputStream;
>> >> import java.util.ArrayList;
>> >> import java.util.HashMap;
>> >> import java.util.List;
>> >> import java.util.Map;
>> >> import opennlp.tools.ml.perceptron.PerceptronTrainer;
>> >> import opennlp.tools.namefind.BioCodec;
>> >> import opennlp.tools.namefind.NameFinderME;
>> >> import opennlp.tools.namefind.NameSample;
>> >> import opennlp.tools.namefind.TokenNameFinderFactory;
>> >> import opennlp.tools.namefind.TokenNameFinderModel;
>> >> import opennlp.tools.postag.POSModel;
>> >> import opennlp.tools.util.ObjectStream;
>> >> import opennlp.tools.util.ObjectStreamUtils;
>> >> import opennlp.tools.util.TrainingParameters;
>> >> import org.apache.commons.io.IOUtils;
>> >>
>> >> public class Test {
>> >>
>> >>     private final String GENERATORS = "/home/damiano/test.xml";
>> >>     private final String POSTAGGER = "/home/damiano/postagger.bin";
>> >>     private final String SERIALIZED = "/home/damiano/serialized.bin";
>> >>
>> >>     public static void main(String[] args) throws IOException {
>> >>         Test test = new Test();
>> >>     }
>> >>
>> >>     public Test() throws IOException {
>> >>
>> >>         List<NameSample> labelled = new ArrayList<>();
>> >>
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> JACOB <END>", false));
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> JACK <END>", false));
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> THOMAS <END>", false));
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> GEORGE <END>", false));
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> WILLIAM <END>", false));
>> >>         labelled.add(NameSample.parse("This is a sentence
>> >> <START:person>
>> >> JAMES <END>", false));
>> >>
>> >>         TokenNameFinderFactory factory;
>> >>
>> >>         try (ObjectStream<NameSample> samples =
>> >> ObjectStreamUtils.createObjectStream(labelled)) {
>> >>             //HashMap<String, Object> map = new HashMap<>();
>> >>
>> >>             try (InputStream in = new FileInputStream(GENERATORS)) {
>> >>
>> >>                 // Resources
>> >>                 Map<String, Object> map = new HashMap<>();
>> >>
>> >>                 // Pos Tagger
>> >>                 map.put("postagger.bin",
>> >> Test.loadPosTagger(POSTAGGER));
>> >>
>> >>
>> >>                 // Factory
>> >>                 factory = new TokenNameFinderFactory(
>> >>                     IOUtils.toByteArray(in),
>> >>                     map,
>> >>                     new BioCodec()
>> >>                 );
>> >>
>> >>                 try {
>> >>
>> >>                     TrainingParameters mlParams = new
>> >> TrainingParameters();
>> >>                     mlParams.put(TrainingParameters.ALGORITHM_PARAM,
>> >> PerceptronTrainer.PERCEPTRON_VALUE);
>> >>                     mlParams.put(TrainingParameters.ITERATIONS_PARAM,
>> >> Integer.toString(300));
>> >>                     mlParams.put(TrainingParameters.CUTOFF_PARAM,
>> >> Integer.toString(0));
>> >>
>> >>                     TokenNameFinderModel model =
>> >> NameFinderME.train("it",
>> >> "person", samples, mlParams, factory);
>> >>
>> >>                     try (BufferedOutputStream modelOut = new
>> >> BufferedOutputStream(new FileOutputStream(SERIALIZED))) {
>> >>                         model.serialize(modelOut);
>> >>                     }
>> >>
>> >>                 } catch (Exception ex) {
>> >>                     ex.printStackTrace();
>> >>                 }
>> >>
>> >>             }
>> >>         }
>> >>     }
>> >>
>> >>     public static POSModel loadPosTagger (String modelName) {
>> >>
>> >>         try (InputStream modelIn = new FileInputStream(modelName)) {
>> >>             POSModel model = new POSModel(modelIn);
>> >>             return model;
>> >>         }
>> >>         catch (Exception ex) { ex.printStackTrace();  }
>> >>
>> >>         return null;
>> >>     }
>> >> }
>> >>
>> >> *GENERATORS:*
>> >>
>> >> <?xml version="1.0" encoding="UTF-8"?>
>> >> <generators>
>> >>     <cache>
>> >>         <generators>
>> >>             <window prevLength="4" nextLength="2">
>> >>                 <tokenclass />
>> >>             </window>
>> >>             <window prevLength="4" nextLength="2">
>> >>                 <token />
>> >>             </window>
>> >>             <!-- Pos Tagger -->
>> >>             <window prevLength="4" nextLength="2">
>> >>                 <tokenpos model="postagger.bin" />
>> >>             </window>
>> >>             <definition />
>> >>             <prevmap />
>> >>             <bigram />
>> >>             <sentence begin="true" end="false" />
>> >>         </generators>
>> >>     </cache>
>> >> </generators>
>> >>
>> >>
>> >> *OUTPUT (with error):*
>> >>
>> >>
>> >> *Indexing events using cutoff of 0 Computing event counts...  done. 30
>> >> events Indexing...  done.Collecting events... Done
>> >> indexing.Incorporating
>> >> indexed data for training...  done. Number of Event Tokens: 30
>> >> Number of
>> >> Outcomes: 2  Number of Predicates: 144Computing model
>> >> parameters...Performing 300 iterations.  1:  . (27/30) 0.9  2:  .
>> >> (30/30)
>> >> 1.0  3:  . (30/30) 1.0  4:  . (30/30) 1.0  5:  . (30/30) 1.0Stopping:
>> >> change in training set accuracy less than 1.0E-5Stats: (30/30)
>> >> 1.0...done.Compressed 144 parameters to 621 outcome
>> >> patternsjava.lang.IllegalStateException: Missing serializer for
>> >> postagger.bin at
>> >> opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:589) at
>> >> com.damiano.trainer.Test.<init>(Test.java:75) at
>> >> com.damiano.trainer.Test.main(Test.java:31)*
>> >>
>> >> 2017-06-07 15:48 GMT+02:00 Damiano Porta <damianopo...@gmail.com>:
>> >>
>> >>> Hmm let me try again, yes i copied it badly, i think the names are
>> >>> correct, i will give you a working example.
>> >>>
>> >>> 2017-06-07 15:46 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:
>> >>>
>> >>>> Ok, but are you sure you used matching names? The exception states
>> >>>> it-pos-maxent.bin,
>> >>>> which object did you map to it?
>> >>>>
>> >>>> Jörn
>> >>>>
>> >>>> On Wed, Jun 7, 2017 at 3:22 PM, Damiano Porta
>> >>>> <damianopo...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> > Hi Jorn! Yes
>> >>>> >
>> >>>> >         <dependency>
>> >>>> >             <groupId>org.apache.opennlp</groupId>
>> >>>> >             <artifactId>opennlp-tools</artifactId>
>> >>>> >             <version>1.8.0</version>
>> >>>> >         </dependency>
>> >>>> >
>> >>>> > Do i need others dependencies too?
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > 2017-06-07 14:53 GMT+02:00 Joern Kottmann <kottm...@gmail.com>:
>> >>>> >
>> >>>> > > This should be working. Did you test with 1.8.0?
>> >>>> > >
>> >>>> > > Jörn
>> >>>> > >
>> >>>> > > On Mon, Jun 5, 2017 at 3:43 PM, Damiano Porta <
>> >>>> damianopo...@gmail.com>
>> >>>> > > wrote:
>> >>>> > >
>> >>>> > > > Hello,
>> >>>> > > > i am using the POSTaggerFeatureGenerator via generators.xml
>> >>>> > > >
>> >>>> > > > <tokenpos model="postagger.bin" />
>> >>>> > > >
>> >>>> > > > during the training i add this model in the resources doing:
>> >>>> > > >
>> >>>> > > >         HashMap<String, Object> map = new HashMap<>();
>> >>>> > > >         map.put("postagger.bin", myPostaggerModel);
>> >>>> > > >
>> >>>> > > >
>> >>>> > > >          factory = new TokenNameFinderFactory(
>> >>>> > > >                IOUtils.toByteArray(in),
>> >>>> > > >                map,
>> >>>> > > >                new BioCodec()
>> >>>> > > >          );
>> >>>> > > >
>> >>>> > > > I get this error:
>> >>>> > > >
>> >>>> > > > java.lang.IllegalStateException: Missing serializer for
>> >>>> > > it-pos-maxent.bin
>> >>>> > > > at opennlp.tools.util.model.BaseModel.serialize(BaseModel.java:
>> >>>> 589)
>> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.<init>(Trainer.java:187)
>> >>>> > > > at com.damiano.nlp.ner.trainer.Trainer.main(Trainer.java:44)
>> >>>> > > > 2017-06-05 15:37:35 INFO  Trainer:192 -
>> >>>> java.lang.IllegalStateExceptio
>> >>>> > n:
>> >>>> > > > Missing serializer for postagger.bin
>> >>>> > > >
>> >>>> > > > Do i have to change the extension of the file?
>> >>>> > > >
>> >>>> > > > Thanks
>> >>>> > > >
>> >>>> > >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>
>

Reply via email to