krickert commented on PR #1105:
URL: https://github.com/apache/opennlp/pull/1105#issuecomment-4764290323

   **Status since the last review.** Public API unchanged: 
`find`/`findInOriginal` signatures, `OffsetMappingNameFinder`, `opennlp-dl` 
compiles against `opennlp-api` only.
   
   - Chunk-boundary fix: decode each chunk bounded to its own character region, 
then `mergeOverlappingSpans` keeps the longer span (probability tie-break). The 
earlier "duplicate spans" framing did not hold; the forward cursor already 
serializes output. The real bug was a fuller boundary entity being dropped, now 
fixed. `whitespaceChunkSpans` carries the char span; `whitespaceChunks` 
delegates to it.
   - `normalizeInputAligned` composes whitespace and dash via 
`Alignment.andThen`, dropping the "whitespace is length-preserving" assumption. 
`InferenceOptions` javadoc documents 1:1 whitespace vs the runtime 
collapse-and-trim fold, and the supplementary-dash offset shift to 
`findInOriginal()`.
   - Removed dead public `I_PER`/`B_PER`. `mergeOverlappingSpans` documented as 
length-dominant, not type-aware.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to