Hi Nicolas, Can you please open a Jira? I will investigate the issue.
Thanks, William On Thu, Oct 6, 2011 at 9:46 AM, Nicolas Hernandez < [email protected]> wrote: > On Thu, Oct 6, 2011 at 2:34 PM, Jörn Kottmann <[email protected]> wrote: > > Looks like the Cross Validator is failing because you do > > not have enough data? On how many sample sentences do you > > run it? > I tested with 1 000 and 1 000 000... same results except I had to > extend the java heap size for one of them before getting the error... > > > > Just to let you know for, below you will find what I got for the > Tokenizer (here with a 1000 sentences train corpus) > > $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model > data/model/fr-token.bin -data data/test/fr-token.test > Loading Tokenizer model ... done (0,428s) > Evaluating ... Exception in thread "main" java.lang.NullPointerException > at > opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76) > at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) > at > opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81) > at opennlp.tools.cmdline.CLI.main(CLI.java:191) > > $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data > data/train/fr-token.train > Indexing events using cutoff of 5 > Computing event counts... done. 100333 events > Indexing... done. > Sorting and merging events... done. Reduced 100333 events to 30168. > Done indexing. > Incorporating indexed data for training... > done. > Number of Event Tokens: 30168 > Number of Outcomes: 2 > Number of Predicates: 8287 > ...done. > Computing model parameters ... > Performing 100 iterations. > 1: ... loglikelihood=-69545.53606709359 0.9337805108987073 > 2: ... loglikelihood=-18987.123809719425 0.9497872085953774 > ... > 98: ... loglikelihood=-607.4216932752298 0.9989534848952987 > 99: ... loglikelihood=-603.2346954947699 0.9989734185163406 > 100: ... loglikelihood=-599.1235213848983 0.9989833853268616 > Exception in thread "main" java.lang.NullPointerException > at > opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76) > at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) > at > opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98) > at > opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94) > at opennlp.tools.cmdline.CLI.main(CLI.java:191) > > > > > > We will investigate this further. > > > > Jörn > > > > On 10/6/11 2:26 PM, Nicolas Hernandez wrote: > >> > >> Please find below the output of two runs which lead to an error: > >> SentenceDetectorEvaluator without "-misclassified true" parameter and > >> SentenceDetectorCrossValidator (which gives the same error with or > >> without "-misclassified true"). > >> > >> I tested on the examples from the documentation and also with my data. > >> Tell if you want more details or anything > >> > >> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model > >> data/model/fr-sent.bin -data data/test/fr-sent.test > >> Loading Sentence Detector model ... done (0,013s) > >> Evaluating ... in thread "main" java.lang.NullPointerException > >> at > >> opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80) > >> at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98) > >> at > >> > opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80) > >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) > >> > >> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data > >> data/train/fr-sent.train -misclassified true > >> Indexing events using cutoff of 5 > >> > >> Computing event counts... done. 0 events > >> Indexing... done. > >> Sorting and merging events... Done indexing. > >> Incorporating indexed data for training... > >> Exception in thread "main" java.lang.NullPointerException > >> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) > >> at opennlp.maxent.GIS.trainModel(GIS.java:256) > >> at opennlp.model.TrainUtil.train(TrainUtil.java:182) > >> at > >> > opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283) > >> at > >> > opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104) > >> at > >> > opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98) > >> at opennlp.tools.cmdline.CLI.main(CLI.java:191) > >> > >> > >> > >> On Thu, Oct 6, 2011 at 1:02 PM, Jörn Kottmann<[email protected]> > wrote: > >>> > >>> On 10/6/11 12:42 PM, Nicolas Hernandez wrote: > >>>> > >>>> I try to run the Evaluator and CrossValidator programs of the 1.5.3 in > >>>> command line ? > >>>> > >>>> It seems that the SentenceDetector, Tokenizer, PosTagger and the > >>>> chunker (at least) throw a java.lang.NullPointerException if the > >>>> misclassified parameter is set to false or not present for the > >>>> Evaluator programs. The CrossValidator programs do not work at all. > >>>> > >>>> Before looking at it, is something (e.g. global refactoring) planed > >>>> about > >>>> it ? > >>> > >>> 1.5.3 is the mostly the same version as the 1.5.2 RC 2. > >>> > >>> The bugs you describe here should of course not be present, and must be > >>> fixed for the 1.5.2 release. We just did a major refactoring of a lot > of > >>> cmd > >>> line > >>> code. Looks like a regression. > >>> > >>> Can you please give us more details? The stack trace would be helpful > and > >>> the > >>> command line arguments you passed in. To find a bug I believe it should > >>> be > >>> enough > >>> to get this for one of the mentioned evaluators. > >>> > >>> Jörn > >>> > > > > >
