Severity: moderate 

Affected versions:

- Apache OpenNLP (org.apache.opennlp:opennlp-tools) before 2.5.9
- Apache OpenNLP (org.apache.opennlp:opennlp-tools) 3.0 before 3.0.0-M3

Description:

XML External Entity (XXE) via Unsanitized Dictionary Parsing in Apache OpenNLP 
DictionaryEntryPersistor


Versions Affected: before 2.5.9, before 3.0.0-M3


Description: The DictionaryEntryPersistor class initializes a static 
SAXParserFactory at class-load time without enabling FEATURE_SECURE_PROCESSING 
or disabling DTD processing. When create(InputStream, EntryInserter) is 
invoked, the only feature set on the XMLReader is namespace support — external 
entity resolution and DOCTYPE declarations remain fully enabled. An attacker 
who can supply a crafted dictionary file (e.g., a stop-word list or domain 
dictionary) containing a malicious DOCTYPE declaration can trigger local file 
disclosure via file:// entity references or server-side request forgery via 
http:// entity references during SAX parsing, before the application processes 
a single dictionary entry. This is inconsistent with the project's own 
XmlUtil.createSaxParser() helper, which correctly sets 
FEATURE_SECURE_PROCESSING and disallow-doctype-decl and is used by all other 
XML parsing paths in the codebase. The public Dictionary(InputStream) 
constructor delegates directly to this method and is the documented API for 
loading user-supplied dictionaries, making untrusted input a realistic scenario.


Mitigation: 2.x users should upgrade to 2.5.9. 3.x users should upgrade to 
3.0.0-M3. Users who cannot upgrade immediately should ensure that all 
dictionary files are sourced from trusted origins and should consider wrapping 
the Dictionary(InputStream) constructor with input validation that rejects any 
XML containing a DOCTYPE declaration before it reaches the parser.

Credit:

Subramanian S (finder)

References:

https://opennlp.apache.org/
https://www.cve.org/CVERecord?id=CVE-2026-40682

Reply via email to