alhudz commented on code in PR #1719:
URL: https://github.com/apache/commons-lang/pull/1719#discussion_r3445626409
##########
src/test/java/org/apache/commons/lang3/StringUtilsTest.java:
##########
@@ -3054,6 +3054,18 @@ void testTruncate_StringIntInt() {
assertEquals("", StringUtils.truncate("abcdefghijklmno",
Integer.MAX_VALUE, Integer.MAX_VALUE));
}
+ @Test
+ void testTruncate_StringIntInt_surrogatePair() {
+ // U+1F600 GRINNING FACE is a single supplementary code point stored
as a surrogate pair
+ final String grin = "😀";
+ // a cut that would land between the two halves keeps the result well
formed instead of emitting a lone surrogate
+ assertEquals("a", StringUtils.truncate("a" + grin + "b", 0, 2));
+ assertEquals(grin, StringUtils.truncate("a" + grin + "b", 1, 2));
+ assertEquals("ab", StringUtils.truncate("ab" + grin, 0, 3));
+ // an offset that lands inside a pair skips the orphaned low surrogate
+ assertEquals("a", StringUtils.truncate(grin + "ab", 1, 2));
Review Comment:
Added. `truncate(grin, ...)` is now covered across offset/maxWidth
permutations: the whole pair when it fits (`0,2`, `0,3`), and empty when the
cut or the offset lands inside it (`0,0`, `0,1`, `1,1`, `1,2`, `2,2`). Never a
lone surrogate.
##########
src/test/java/org/apache/commons/lang3/StringUtilsAbbreviateTest.java:
##########
@@ -170,6 +171,31 @@ void testAbbreviate_StringStringIntInt() {
assertEquals("....fg", StringUtils.abbreviate("abcdefg", "....", 5,
6));
}
+ @Test
+ void testAbbreviateSurrogatePair() {
+ // U+1F600 GRINNING FACE is a single supplementary code point stored
as a surrogate pair
+ final String grin = "😀";
+ // the head cut backs off the pair so the marker is never preceded by
a lone high surrogate
+ assertEquals("...", StringUtils.abbreviate(grin + "abcdef", 4));
+ assertEquals(grin + "...", StringUtils.abbreviate(grin + "abcdef", 5));
+ // a trailing supplementary code point is kept whole rather than
sliced into a lone low surrogate
+ assertEquals("..." + grin, StringUtils.abbreviate("abcdef" + grin, 6,
5));
+ // results stay within maxWidth and never contain an unpaired surrogate
+ for (int width = 4; width <= 8; width++) {
+ final String result = StringUtils.abbreviate("a" + grin + "b" +
grin + "cd", width);
+ assertTrue(result.length() <= width, () -> "result longer than
maxWidth: " + result);
+ for (int i = 0; i < result.length(); i++) {
+ final char ch = result.charAt(i);
+ if (Character.isHighSurrogate(ch)) {
+ assertTrue(i + 1 < result.length() &&
Character.isLowSurrogate(result.charAt(i + 1)), "lone high surrogate in: " +
result);
+ i++; // skip the paired low surrogate
+ } else {
+ assertFalse(Character.isLowSurrogate(ch), "lone low
surrogate in: " + result);
+ }
+ }
+ }
+ }
+
Review Comment:
Done. Added `abbreviate(grin, ...)` across the width, offset and marker
permutations. The pair is kept whole in the normal cases, and the empty-marker
case (`grin, "", 0, 1`) exercises the head cut backing off to empty instead of
emitting a lone surrogate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]