The Apache OpenNLP team is pleased to announce the release of version 3.0.0-M3 
of Apache OpenNLP. 
The Apache OpenNLP library is a machine learning based toolkit for the 
processing of natural language text.

 It supports the most common NLP tasks, such as tokenization, sentence 
segmentation, part-of-speech tagging, named entity extraction, chunking, and 
parsing.
The OpenNLP 3.0.0-M3 binary and source distributions are available for download 
from our download page: https://opennlp.apache.org/download.html
The OpenNLP library is distributed by Maven Central as well. See the Maven 
Dependency page for more details: 
https://opennlp.apache.org/maven-dependency.html

Changes in this version:

This release introduces new NLP capabilities, addresses three security issues 
(also backported to 2.5.9), and refreshes several dependencies.

New features and improvements:

    • OPENNLP-1518: Add support for Roberta-based models via ONNX
    • OPENNLP-1220: Add support for Byte Pair Encoding (BPE)
    • OPENNLP-53: Add Parse.createFromTokens() for convenient tokenized input
    • OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable 
instance state

Security fixes:

    • OPENNLP-1819: Fix XXE vulnerability in DictionaryEntryPersistor by 
aligning XML parsing with XmlUtil (secure processing enabled, DOCTYPE 
disallowed)
    • OPENNLP-1820: Restrict ExtensionLoader to an allowlisted set of package 
prefixes, preventing arbitrary class initialization from crafted model archives
    • OPENNLP-1821: Prevent OutOfMemoryError in AbstractModelReader by bounding 
count fields read from binary models before array allocation

Dependency updates:
    • OPENNLP-1817: Update log4j2 to 2.25.4
    • OPENNLP-1818: Update zlibsvm-core to 3.0.0
    • OPENNLP-1822: Update ONNX runtime to 1.25.0

For a complete list of fixed bugs and improvements please see the RELEASE_NOTES 
file included in the distribution.
The Apache OpenNLP Team

Reply via email to