Re: [PR] OPENNLP-1844 - Make opennlp-dl components thread-safe (opennlp)

via GitHub Sat, 13 Jun 2026 13:36:04 -0700


Copilot commented on code in PR #1084:
URL: https://github.com/apache/opennlp/pull/1084#discussion_r3408572610



##########
opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/AbstractDL.java:
##########
@@ -46,14 +46,57 @@ public abstract class AbstractDL implements AutoCloseable {
   public static final String ATTENTION_MASK = "attention_mask";
   public static final String TOKEN_TYPE_IDS = "token_type_ids";
 
-  protected OrtEnvironment env;
-  protected OrtSession session;
-  protected Tokenizer tokenizer;
-  protected Map<String, Integer> vocab;
+  protected final OrtEnvironment env;
+  protected final OrtSession session;
+  protected final Tokenizer tokenizer;
+  protected final Map<String, Integer> vocab;
 
   private static final Pattern JSON_ENTRY_PATTERN =
       Pattern.compile("\"((?:[^\"\\\\]|\\\\.)*)\"\\s*:\\s*(\\d+)");
 
+  /**
+   * Initializes the shared, immutable inference state: the ONNX environment 
and session,
+   * the loaded vocabulary and the configured tokenizer. These fields are 
{@code final}
+   * and assigned exactly once here, so a fully constructed instance is safely 
published
+   * and can be shared across threads.
+   *
+   * @param model The ONNX model file.
+   * @param vocabulary The vocabulary file matching the model.
+   * @param sessionOptions The session options (e.g. CUDA execution provider); 
build with
+   *     {@link #sessionOptions(InferenceOptions)} when honoring {@link 
InferenceOptions}.
+   * @param lowerCase {@code true} for uncased models (lower casing and accent 
stripping
+   *     during tokenization), {@code false} for cased models.
+   *
+   * @throws OrtException Thrown if the {@code model} cannot be loaded.
+   * @throws IOException Thrown if the {@code model} or {@code vocabulary} 
cannot be read.
+   */
+  protected AbstractDL(final File model, final File vocabulary,
+                       final OrtSession.SessionOptions sessionOptions, final 
boolean lowerCase)
+      throws IOException, OrtException {
+    this.env = OrtEnvironment.getEnvironment();
+    this.session = env.createSession(model.getPath(), sessionOptions);

Review Comment:
   This PR makes binary-incompatible API changes in a public base class: 
loadVocab(...) and createTokenizer(...) changed from instance to static, and 
the implicit no-arg AbstractDL() constructor is removed. Already-compiled 
client code (or external subclasses) can fail at runtime with 
NoSuchMethodError. If this is intentional for 3.0.0, please call out the 
breaking change explicitly; otherwise consider preserving compatibility (e.g., 
keep instance entry points and move statics to a new utility class).



##########
opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/AbstractDL.java:
##########
@@ -46,14 +46,57 @@ public abstract class AbstractDL implements AutoCloseable {
   public static final String ATTENTION_MASK = "attention_mask";
   public static final String TOKEN_TYPE_IDS = "token_type_ids";
 
-  protected OrtEnvironment env;
-  protected OrtSession session;
-  protected Tokenizer tokenizer;
-  protected Map<String, Integer> vocab;
+  protected final OrtEnvironment env;
+  protected final OrtSession session;
+  protected final Tokenizer tokenizer;
+  protected final Map<String, Integer> vocab;
 
   private static final Pattern JSON_ENTRY_PATTERN =
       Pattern.compile("\"((?:[^\"\\\\]|\\\\.)*)\"\\s*:\\s*(\\d+)");
 
+  /**
+   * Initializes the shared, immutable inference state: the ONNX environment 
and session,
+   * the loaded vocabulary and the configured tokenizer. These fields are 
{@code final}
+   * and assigned exactly once here, so a fully constructed instance is safely 
published
+   * and can be shared across threads.
+   *
+   * @param model The ONNX model file.
+   * @param vocabulary The vocabulary file matching the model.
+   * @param sessionOptions The session options (e.g. CUDA execution provider); 
build with
+   *     {@link #sessionOptions(InferenceOptions)} when honoring {@link 
InferenceOptions}.
+   * @param lowerCase {@code true} for uncased models (lower casing and accent 
stripping
+   *     during tokenization), {@code false} for cased models.
+   *
+   * @throws OrtException Thrown if the {@code model} cannot be loaded.
+   * @throws IOException Thrown if the {@code model} or {@code vocabulary} 
cannot be read.
+   */
+  protected AbstractDL(final File model, final File vocabulary,
+                       final OrtSession.SessionOptions sessionOptions, final 
boolean lowerCase)
+      throws IOException, OrtException {
+    this.env = OrtEnvironment.getEnvironment();
+    this.session = env.createSession(model.getPath(), sessionOptions);
+    this.vocab = Map.copyOf(loadVocab(vocabulary));
+    this.tokenizer = createTokenizer(vocab, lowerCase);
+  }

Review Comment:
   If loadVocab(...) or createTokenizer(...) throws after the ONNX session is 
created, the session leaks because the constructor exits exceptionally without 
closing it. This can accumulate native resources across repeated failed 
initializations. Create the session into a local variable and close it on any 
exception before rethrowing.



##########
opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/namefinder/NameFinderDL.java:
##########
@@ -51,9 +51,15 @@
  * boundaries. For uncased models, set
  * {@link InferenceOptions#setLowerCase(boolean)} to {@code true}.</p>
  *
+ * <p>This class is thread-safe and may be shared across threads, provided the 
supplied
+ * {@link SentenceDetector} is itself thread-safe (e.g. {@link 
opennlp.tools.sentdetect.SentenceDetectorME},
+ * which is {@code @ThreadSafe}). Inference holds no per-call instance state 
and the
+ * underlying {@link OrtSession} supports concurrent execution.</p>
+ *

Review Comment:
   The new `@ThreadSafe` claim is unconditional, but this class retains a 
reference to the caller-provided InferenceOptions (which is mutable via 
setters). If that object is mutated while the instance is shared across 
threads, reads of include*Mask / split sizes become a data race and the class 
is no longer thread-safe. Either snapshot the needed option values into final 
fields, or document that the passed InferenceOptions must not be mutated after 
construction.



##########
opennlp-eval-tests/src/test/java/opennlp/dl/namefinder/NameFinderDLEval.java:
##########
@@ -72,6 +78,60 @@ public void tokenNameFinder1Test() throws Exception {
 
   }
 
+  /**
+   * Verifies that a single {@link NameFinderDL} instance is safe to share 
across threads:
+   * many threads call {@link NameFinderDL#find(String[])} concurrently on one 
instance and
+   * every call must return the same correct result as the single-threaded 
case. A data
+   * race on the shared inference state would surface here as a wrong span, an 
exception or
+   * a non-deterministic result.
+   */
+  @Test
+  public void tokenNameFinderConcurrentTest() throws Exception {
+
+    final File model = new File(getOpennlpDataDir(), 
"onnx/namefinder/model.onnx");
+    final File vocab = new File(getOpennlpDataDir(), 
"onnx/namefinder/vocab.txt");
+
+    final String[] tokens = new String[]
+        {"George", "Washington", "was", "president", "of", "the", "United", 
"States", "."};
+    final String text = String.join(" ", tokens);
+
+    final int threads = 8;
+    final int iterationsPerThread = 25;
+
+    try (final NameFinderDL nameFinderDL = new NameFinderDL(model, vocab, 
getIds2Labels(),
+        sentenceDetector)) {
+
+      final ExecutorService executor = Executors.newFixedThreadPool(threads);
+      final CountDownLatch startGate = new CountDownLatch(1);
+      final List<Future<Boolean>> futures = new ArrayList<>();
+
+      for (int t = 0; t < threads; t++) {
+        futures.add(executor.submit(() -> {
+          // Release all threads at once to maximize contention on the shared 
instance.
+          startGate.await();
+          for (int i = 0; i < iterationsPerThread; i++) {
+            final Span[] spans = nameFinderDL.find(tokens);
+            if (spans.length != 1
+                || spans[0].getStart() != 0
+                || spans[0].getEnd() != 17
+                || !"George Washington".equals(spans[0].getCoveredText(text))) 
{
+              return false;
+            }
+          }
+          return true;
+        }));
+      }
+
+      startGate.countDown();
+      for (Future<Boolean> future : futures) {
+        Assertions.assertTrue(future.get(),
+            "a concurrent find() returned a result inconsistent with the 
single-threaded case");
+      }
+      executor.shutdown();

Review Comment:
   In this concurrent test, the ExecutorService is only shut down on the happy 
path. If an assertion fails or future.get() throws, the thread pool can remain 
running and keep the JVM alive / make the test suite hang. Wrap the Future 
assertion loop in a try/finally and shut down the executor in the finally block.



##########
opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/doccat/DocumentCategorizerDL.java:
##########
@@ -57,10 +57,15 @@
  * models commonly used for classification. For cased models, set
  * {@link InferenceOptions#setLowerCase(boolean)} to {@code false}.</p>
  *
+ * <p>This class is thread-safe and may be shared across threads: inference 
holds no
+ * per-call instance state, the {@link ClassificationScoringStrategy} is 
stateless and the
+ * underlying {@link OrtSession} supports concurrent execution.</p>
+ *

Review Comment:
   The new `@ThreadSafe` claim is unconditional, but this class keeps a 
reference to the caller-provided InferenceOptions (mutable via setters). If 
callers mutate InferenceOptions while the categorizer is shared across threads, 
reads of include*Mask / split sizes become a data race and the instance is no 
longer thread-safe. Either snapshot needed option values into final fields, or 
document that the passed InferenceOptions must not be mutated after 
construction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] OPENNLP-1844 - Make opennlp-dl components thread-safe (opennlp)

Reply via email to