Copilot commented on code in PR #1086:
URL: https://github.com/apache/opennlp/pull/1086#discussion_r3408818711


##########
opennlp-core/opennlp-ml/opennlp-dl/src/main/java/opennlp/dl/namefinder/NameFinderDL.java:
##########
@@ -187,13 +193,16 @@ public Span[] find(String[] input) {
             // Can we do thresholding without it between 0 and 1?
             final double confidence = arr[maxIndex]; // / 10;
 
-            // Is this is the start of a person entity.
-            if (B_PER.equals(label)) {
+            // Is this the start of an entity? Any B-<TYPE> begins a span of 
that type.
+            if (label != null && label.startsWith(BEGIN_PREFIX)) {
+
+              // The entity type is the label without its B- prefix, e.g. 
B-ORG -> ORG.
+              final String entityType = label.substring(BEGIN_PREFIX.length());

Review Comment:
   `entityType` is derived from the BIO label via `substring("B-".length())`, 
but the current check only verifies `startsWith("B-")`. If `ids2Labels` ever 
contains a bare `"B-"` (or another malformed `B-` label), this will create 
spans with an empty type and an `insideLabel` of `"I-"`, which is likely 
unintended. Add a length guard so only valid `B-<TYPE>` labels start spans.



##########
opennlp-eval-tests/src/test/java/opennlp/dl/namefinder/NameFinderDLEval.java:
##########
@@ -62,12 +62,23 @@ public void tokenNameFinder1Test() throws Exception {
         logger.debug(span.toString());
       }
 
-      Assertions.assertEquals(1, spans.length);
+      final String text = String.join(" ", tokens);
+
+      // The model emits a PER and a LOC entity; the person-only decoder 
previously dropped
+      // the location. Span types are the entity labels (PER/LOC), not the 
matched text.
+      Assertions.assertEquals(2, spans.length);
+
+      Assertions.assertEquals("PER", spans[0].getType());
       Assertions.assertEquals(0, spans[0].getStart());
       Assertions.assertEquals(17, spans[0].getEnd());
       Assertions.assertEquals(8.251646041870117, spans[0].getProb(), 0.00001);
-      Assertions.assertEquals("George Washington",
-          spans[0].getCoveredText(String.join(" ", tokens)));
+      Assertions.assertEquals("George Washington", 
spans[0].getCoveredText(text));
+
+      Assertions.assertEquals("LOC", spans[1].getType());
+      Assertions.assertEquals(39, spans[1].getStart());
+      Assertions.assertEquals(52, spans[1].getEnd());
+      Assertions.assertEquals("United States", spans[1].getCoveredText(text));
+      Assertions.assertTrue(spans[1].getProb() > 0);

Review Comment:
   `Span.getProb()` in `NameFinderDL` is currently the raw max logit 
(`arr[maxIndex]`), not a normalized probability (there’s even a TODO about 
[0,1]). Asserting `> 0` makes this test potentially flaky because logits can be 
negative while still being the argmax. Prefer asserting finiteness or dropping 
the assertion entirely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to