krickert opened a new pull request, #1074: URL: https://github.com/apache/opennlp/pull/1074
## What - `categorize()` leaked native memory on every call: the `OnnxTensor` inputs and the `OrtSession.Result` were never closed. Tensors are now released in a `finally` block and the result via try-with-resources (`getValue()` copies into Java arrays first, so this is safe). - A token missing from the vocabulary caused `vocab.get(...)` to auto-unbox `null` into an `int`, throwing an opaque `NullPointerException` that the broad catch in `categorize()` swallowed into an empty score array. The mapping loop is now a testable `tokenIds()` helper that throws `IllegalArgumentException` naming the missing token, which indicates the vocabulary file does not match the model. ## Why See [OPENNLP-1839](https://issues.apache.org/jira/browse/OPENNLP-1839). Long-running services calling `categorize()` repeatedly accumulate off-heap allocations until the process is killed. This applies the same resource-management pattern as the `SentenceVectorsDL` fix (OPENNLP-1836, #1072). ## Validation New `DocumentCategorizerDLTest` covers the token-id mapping and the vocabulary-miss error. All existing `opennlp-dl` tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
