alhudz commented on PR #1719:
URL: https://github.com/apache/commons-lang/pull/1719#issuecomment-4759024996
Had a look at `testEmoji()`. With the
`expectedResultsFox`/`expectedResultsFamilyWithCodepoints` assertions commented
in, it fails both before and after this PR, so they have never been green on
`master`:
`abbreviate("🦊…", 4)`
- before: `<lone high surrogate>...` (the malformed output this PR targets)
- this PR: `...`
- `testEmoji` expects: `🦊...`
The gap is the contract. `testEmoji` counts `maxWidth` in code points: the
marker is 3, then `maxWidth - 3` whole code points (width 4 → 1 fox, width 5 →
2 foxes, and the family case counts each skin-tone modifier and ZWJ as one). So
`🦊...` is 5 `char`s but 4 code points. This PR keeps the documented
`char`-based contract (`result.length() <= maxWidth`) and only nudges the cut
to a code-point boundary, which is always shorter, so it drops the partial
emoji rather than keeping it whole and over the numeric width.
So this change is strictly the lone-surrogate fix. Making `testEmoji` pass
is the larger LANG-1770 job of re-basing `abbreviate` on code points, which
redefines what `maxWidth` means and the `length() <= maxWidth` guarantee. Happy
to take that on as the actual LANG-1770 fix if you want it in this PR, or keep
this one scoped to removing the malformed surrogate. Which would you prefer?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]