krickert commented on PR #1103:
URL: https://github.com/apache/opennlp/pull/1103#issuecomment-4763023282

   **Dimension javadoc forward-references Term/TermAnalyzer.**
   `Dimension` references `Term`/`TermAnalyzer` with `{@code}`, not `{@link}`, 
so standalone javadoc on this branch produces no unresolved-reference warnings.
   
   **Offset mapping isn't reachable through the builder.**
   You found an offset in my impl (pun intended), and the root cause was the 
missing composition primitive: there was no way to combine the per-stage offset 
maps. I got rid of the `OffsetMapping` and added `Alignment.andThen` so an 
offset-carrying pipeline is now possible. Wiring it through 
`TextNormalizer.build()` for arbitrary
   `CharSequenceNormalizer`s is a follow-up (only the `CharClass`-family 
transforms can emit an alignment cheaply; `java.text.Normalizer`-based stages 
would need ICU-style edit callbacks), but the primitive it depends on is in 
place.
   
   **OffsetMap buffer growth overflows past ~2^30.**
   `OffsetMap` is removed. Its replacement, `Alignment.Builder`, grows 
overflow-aware
   (`length + (length >> 1)`, clamped to `Integer.MAX_VALUE - 8`), so it 
degrades to a clean `OutOfMemoryError` instead of `NegativeArraySizeException`. 
`WordSegmenter.IntList` got the same treatment (see #1104).
   
   **Confusables.load() has no per-line guard.**
   Fixed. The per-line parse is wrapped and rethrows an `IllegalStateException` 
naming the offending line number, mirroring `CodePointSet.fromFile`, instead of 
surfacing a raw `ExceptionInInitializerError`. (A bundled-file checksum/version 
assertion is a reasonable follow-up but is left out here.)
   
   **Nit: serialVersionUID = 1L vs random longs; builder() returns its own 
mutable builder.**
   Although I'm camp "1L" for various reasons, I don't mind either way.  
Changing that now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to