Re: [PR] OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state (opennlp)

via GitHub Thu, 09 Apr 2026 07:56:56 -0700


krickert commented on code in PR #1003:
URL: https://github.com/apache/opennlp/pull/1003#discussion_r3058717974



##########
opennlp-core/opennlp-ml/opennlp-ml-commons/src/main/java/opennlp/tools/ml/BeamSearch.java:
##########
@@ -63,92 +84,82 @@ public BeamSearch(int size, MaxentModel model) {
   }
 
   /**
-   * Initializes a {@link BeamSearch} instance.
+   * Initializes a {@link BeamSearch} instance with an optional per-thread 
contexts cache.
    *
    * @param size The size of the beam (k).
    * @param model The {@link MaxentModel} for assigning probabilities to the 
sequence outcomes.
-   * @param cacheSize The capacity of the {@link Cache} to use.
+   * @param cacheSize The capacity of the per-thread contexts cache. Use 
{@code 0} to disable caching.
    */
   public BeamSearch(int size, MaxentModel model, int cacheSize) {
 
     this.size = size;
     this.model = model;
-
-    if (cacheSize > 0) {
-      contextsCache = new Cache<>(cacheSize);
-    }
-
-    this.probs = new double[model.getNumOutcomes()];
+    this.cacheSize = cacheSize;
+    this.threadState = ThreadLocal.withInitial(
+        () -> new CacheState(model.getNumOutcomes(), cacheSize));
   }
 
-  /**
-   * Computes the best sequence of outcomes based on the {@link MaxentModel}.
-   *
-   * @param numSequences The number of sequences.
-   * @param sequence The input {@link T} sequence.
-   * @param additionalContext An {@link Object[]} of additional context.
-   *     This is passed to the context generator blindly with the
-   *     assumption that the context are appropriate.
-   * @param minSequenceScore The minimum sequence score to use.
-   * @param cg The {@link BeamSearchContextGenerator context generator} to use.
-   * @param validator The {@link SequenceValidator} to validate sequences.
-   *
-   * @return The top ranked {@link Sequence} of outcomes or {@code null}
-   *         if no sequence could be found.
-   */
   @Override
-  public <T> Sequence[] bestSequences(int numSequences, T[] sequence,
-      Object[] additionalContext, double minSequenceScore,
-      BeamSearchContextGenerator<T> cg, SequenceValidator<T> validator) {
+  public <T> Sequence[] bestSequences(final int numSequences, final T[] 
sequence,
+      final Object[] additionalContext, final double minSequenceScore,
+      final BeamSearchContextGenerator<T> cg, final SequenceValidator<T> 
validator) {
+
+    final CacheState state = threadState.get();
 
     Queue<Sequence> prev = new PriorityQueue<>(size);
     Queue<Sequence> next = new PriorityQueue<>(size);
     Queue<Sequence> tmp;
     prev.add(new Sequence());
 
-    if (additionalContext == null) {
-      additionalContext = EMPTY_ADDITIONAL_CONTEXT;
+    Object[] context = additionalContext;
+    if (context == null) {
+      context = EMPTY_ADDITIONAL_CONTEXT;
     }
 
     for (int i = 0; i < sequence.length; i++) {
-      int sz = StrictMath.min(size, prev.size());
+      final int sz = StrictMath.min(size, prev.size());
 
       for (int sc = 0; prev.size() > 0 && sc < sz; sc++) {
-        Sequence top = prev.remove();
-        List<String> tmpOutcomes = top.getOutcomes();
-        String[] outcomes = tmpOutcomes.toArray(new String[0]);
-        String[] contexts = cg.getContext(i, sequence, outcomes, 
additionalContext);
-        double[] scores;
-        if (contextsCache != null) {
-          scores = contextsCache.computeIfAbsent(contexts, c -> model.eval(c, 
probs));
+        final Sequence top = prev.remove();
+        final List<String> tmpOutcomes = top.getOutcomes();
+        final String[] outcomes = tmpOutcomes.toArray(new String[0]);
+        final String[] contexts = cg.getContext(i, sequence, outcomes, 
context);
+        final double[] scores;
+        if (state.cache != null) {
+          scores = state.cache.computeIfAbsent(contexts, c -> {
+            double[] res = model.eval(c, state.probs);
+            double[] copy = new double[res.length];

Review Comment:
   Added a short comment in the lambda: `model.eval` writes into the shared 
`state.probs` buffer, so cached entries must be immutable copies for reuse 
across beam steps.
   
   Pushed with the same commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] OPENNLP-1816: Make ME classes thread-safe by eliminating shared mutable instance state (opennlp)

Reply via email to