[ https://issues.apache.org/jira/browse/OPENNLP-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Wiesner updated OPENNLP-1546: ------------------------------------ Description: The NER training code example needs updated. [https://opennlp.apache.org/docs/2.3.2/manual/opennlp.html#tools.namefind.training.api] * The `TokenNameFinderFactory nameFinderFactory` part won't compile. * This code might be outdated in general. {code:java} ObjectStream<String> lineStream = new PlainTextByLineStream(new MarkableFileInputStreamFactory(new File("en-ner-person.train")), StandardCharsets.UTF_8); TokenNameFinderModel model; try (ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream)) { model = NameFinderME.train("eng", "person", sampleStream, TrainingParameters.defaultParams(), nameFinderFactory); } try (ObjectStream modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)){ model.serialize(modelOut); } {code} For reference (but not tested): {code:java} final InputStreamFactory in = new MarkableFileInputStreamFactory(convertedTrainingFile); final ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8)); final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", null, sampleStream, TrainingParameters.defaultParams(), TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec())); {code} was: The NER training code example needs updated. [https://opennlp.apache.org/docs/1.8.2/manual/opennlp.html#tools.namefind.training.api] * The `TokenNameFinderFactory nameFinderFactory` part won't compile. * This code might be outdated in general. {code:java} ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream("en-ner-person.train"), StandardCharsets.UTF8); TokenNameFinderModel model; try (ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream)) { model = NameFinderME.train("en", "person", sampleStream, TrainingParameters.defaultParams(), TokenNameFinderFactory nameFinderFactory); } try (modelOut = new BufferedOutputStream(new FileOutputStream(modelFile)){ model.serialize(modelOut); } {code} For reference (but not tested): {code:java} final InputStreamFactory in = new MarkableFileInputStreamFactory(convertedTrainingFile); final ObjectStream<NameSample> sampleStream = new NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8)); final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", null, sampleStream, TrainingParameters.defaultParams(), TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new BioCodec())); {code} > NER training code example in documentation needs updated > -------------------------------------------------------- > > Key: OPENNLP-1546 > URL: https://issues.apache.org/jira/browse/OPENNLP-1546 > Project: OpenNLP > Issue Type: Task > Components: Documentation > Reporter: Jeff Zemerick > Assignee: Cody Fearer > Priority: Major > > The NER training code example needs updated. > [https://opennlp.apache.org/docs/2.3.2/manual/opennlp.html#tools.namefind.training.api] > * The `TokenNameFinderFactory nameFinderFactory` part won't compile. > * This code might be outdated in general. > {code:java} > ObjectStream<String> lineStream = > new PlainTextByLineStream(new > MarkableFileInputStreamFactory(new File("en-ner-person.train")), > StandardCharsets.UTF_8); > TokenNameFinderModel model; > try (ObjectStream<NameSample> sampleStream = new > NameSampleDataStream(lineStream)) { > model = NameFinderME.train("eng", "person", sampleStream, > TrainingParameters.defaultParams(), nameFinderFactory); > } > try (ObjectStream modelOut = new BufferedOutputStream(new > FileOutputStream(modelFile)){ > model.serialize(modelOut); > } > {code} > For reference (but not tested): > {code:java} > final InputStreamFactory in = new > MarkableFileInputStreamFactory(convertedTrainingFile); > final ObjectStream<NameSample> sampleStream = new > NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8)); > final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", > null, sampleStream, TrainingParameters.defaultParams(), > TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new > BioCodec())); {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)