[ https://issues.apache.org/jira/browse/OPENNLP-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836897#comment-17836897 ]
ASF GitHub Bot commented on OPENNLP-1546: ----------------------------------------- mawiesne opened a new pull request, #595: URL: https://github.com/apache/opennlp/pull/595 Changes - - adjusts NER training code example to be complete and consistent with 2.x Tasks - Thank you for contributing to Apache OpenNLP. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with OPENNLP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically main)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn clean install at the root opennlp folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file in opennlp folder? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found in opennlp folder? ### For documentation related changes: - [x] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible. > NER training code example in documentation needs updated > -------------------------------------------------------- > > Key: OPENNLP-1546 > URL: https://issues.apache.org/jira/browse/OPENNLP-1546 > Project: OpenNLP > Issue Type: Documentation > Components: Documentation > Reporter: Jeff Zemerick > Assignee: Martin Wiesner > Priority: Major > > The NER training code example needs updated. > [https://opennlp.apache.org/docs/2.3.2/manual/opennlp.html#tools.namefind.training.api] > * The `TokenNameFinderFactory nameFinderFactory` part won't compile. > * This code might be outdated in general. > {code:java} > ObjectStream<String> lineStream = > new PlainTextByLineStream(new > MarkableFileInputStreamFactory(new File("en-ner-person.train")), > StandardCharsets.UTF_8); > TokenNameFinderModel model; > try (ObjectStream<NameSample> sampleStream = new > NameSampleDataStream(lineStream)) { > model = NameFinderME.train("eng", "person", sampleStream, > TrainingParameters.defaultParams(), nameFinderFactory); > } > try (ObjectStream modelOut = new BufferedOutputStream(new > FileOutputStream(modelFile)){ > model.serialize(modelOut); > } > {code} > For reference (but not tested): > {code:java} > final InputStreamFactory in = new > MarkableFileInputStreamFactory(convertedTrainingFile); > final ObjectStream<NameSample> sampleStream = new > NameSampleDataStream(new PlainTextByLineStream(in, StandardCharsets.UTF_8)); > final TokenNameFinderModel nameFinderModel = NameFinderME.train("en", > null, sampleStream, TrainingParameters.defaultParams(), > TokenNameFinderFactory.create(null, null, Collections.emptyMap(), new > BioCodec())); {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)