Severity: moderate 

Affected versions:

- Apache OpenNLP (org.apache.opennlp:opennlp-tools) before 2.5.9
- Apache OpenNLP (org.apache.opennlp:opennlp-tools) 3.0 before 3.0.0-M3

Description:

OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP 
AbstractModelReader 

Versions Affected: 

before 2.5.9

before 3.0.0-M3 

Description:


The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and 
getPredicates() each read a 32-bit signed integer count field from a binary 
model stream and pass that value directly to an array allocation (new 
String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without 
validating that the value is non-negative or within a reasonable bound. The 
count is therefore fully attacker-controlled when the model file originates 
from an untrusted source.


A crafted .bin model file in which any of these count fields is set to 
Integer.MAX_VALUE (or any value large enough to exhaust the available heap) 
triggers an OutOfMemoryError at the array allocation itself, before the 
corresponding label or pattern data is consumed from the stream. The error 
occurs very early in deserialization: for a GIS model, getOutcomes() is reached 
after only the model-type string, the correction constant, and the correction 
parameter have been read; so the attacker pays no meaningful size cost to 
weaponize a payload, and a single small file can crash a JVM that loads it. Any 
code path that deserializes a .bin model is affected, including direct use of 
GenericModelReader and any higher-level component that delegates to it during 
model load.


The practical impact is denial of service against processes that load model 
files from untrusted or semi-trusted origins.  


Mitigation:



  *  2.x users should upgrade to 2.5.9.

  *  3.x users should upgrade to 3.0.0-M3.




Note: The fix introduces an upper bound on each of the three count fields, 
checked before array allocation; counts that are negative or exceed the bound 
cause an IllegalArgumentException to be thrown and the read to fail fast with 
no large allocation. The default bound is 10,000,000, which is well above the 
entry counts of legitimate OpenNLP models but far below any value that would 
threaten heap exhaustion. Deployments that legitimately need to load models 
with more entries than the default can raise the limit at JVM startup by 
setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer 
(e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back 
to the default.


Users who cannot upgrade immediately should treat all .bin model files as 
untrusted input unless their provenance is verified, and should avoid loading 
models supplied by end users or fetched from third-party repositories without 
integrity checks.

This issue is being tracked as OPENNLP-1821 

Credit:

Subramanian S (finder)

References:

https://opennlp.apache.org/
https://www.cve.org/CVERecord?id=CVE-2026-42440
https://issues.apache.org/jira/browse/OPENNLP-1821

Reply via email to