krickert commented on PR #1103: URL: https://github.com/apache/opennlp/pull/1103#issuecomment-4764287154
**Status since the last review.** Offset-model items addressed; additive commits, so inline threads stay anchored. - `buildAligned()` + `OffsetAwareNormalizer` give the `*Aligned` API a real consumer: every per-code-point fold (whitespace, line-break-preserving whitespace, dashes, invisible-strip, quotes, digits, ellipsis, bullets, umlaut) composes into one `AlignedText`. Folds that route through `java.text.Normalizer` or JDK case mapping are rejected loudly, naming the rung. - Capital eszett U+1E9E folds to `SS`. `buildAligned()` reject text states the rule instead of a stale list. `Confusables` javadoc scoped to the skeleton plus equality test (restriction-level, mixed-script, bidi out of scope). Empty aligned pipeline normalizes input to one `String`. - `Alignment.andThen` leading-insertion is not a bug: `Math.max(start, end)` already yields the zero-width span. Added a test that proves it. - New tests: CharClass plain-vs-aligned parity battery, leading-insertion compose, capital-eszett offsets, `buildAligned()` rejection at every index and fold type, `toNormalizedSpan` no over-cover across deletions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
