The Apache OpenNLP team is pleased to announce the release of version 3.0.0-M3 of Apache OpenNLP. The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, and parsing. The OpenNLP 3.0.0-M3 binary and source distributions are available for download from our download page: https://opennlp.apache.org/download.html The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: https://opennlp.apache.org/maven-dependency.html Changes in this version: This release introduces new NLP capabilities, addresses three security issues (also backported to 2.5.9), and refreshes several dependencies. New features and improvements: • OPENNLP-1518: Add support for Roberta-based models via ONNX • OPENNLP-1220: Add support for Byte Pair Encoding (BPE) • OPENNLP-53: Add Parse.createFromTokens() for convenient tokenized input • OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state Security fixes: • OPENNLP-1819: Fix XXE vulnerability in DictionaryEntryPersistor by aligning XML parsing with XmlUtil (secure processing enabled, DOCTYPE disallowed) • OPENNLP-1820: Restrict ExtensionLoader to an allowlisted set of package prefixes, preventing arbitrary class initialization from crafted model archives • OPENNLP-1821: Prevent OutOfMemoryError in AbstractModelReader by bounding count fields read from binary models before array allocation Dependency updates: • OPENNLP-1817: Update log4j2 to 2.25.4 • OPENNLP-1818: Update zlibsvm-core to 3.0.0 • OPENNLP-1822: Update ONNX runtime to 1.25.0 For a complete list of fixed bugs and improvements please see the RELEASE_NOTES file included in the distribution. The Apache OpenNLP Team
