alhudz commented on code in PR #1719:
URL: https://github.com/apache/commons-lang/pull/1719#discussion_r3445626409


##########
src/test/java/org/apache/commons/lang3/StringUtilsTest.java:
##########
@@ -3054,6 +3054,18 @@ void testTruncate_StringIntInt() {
         assertEquals("", StringUtils.truncate("abcdefghijklmno", 
Integer.MAX_VALUE, Integer.MAX_VALUE));
     }
 
+    @Test
+    void testTruncate_StringIntInt_surrogatePair() {
+        // U+1F600 GRINNING FACE is a single supplementary code point stored 
as a surrogate pair
+        final String grin = "😀";
+        // a cut that would land between the two halves keeps the result well 
formed instead of emitting a lone surrogate
+        assertEquals("a", StringUtils.truncate("a" + grin + "b", 0, 2));
+        assertEquals(grin, StringUtils.truncate("a" + grin + "b", 1, 2));
+        assertEquals("ab", StringUtils.truncate("ab" + grin, 0, 3));
+        // an offset that lands inside a pair skips the orphaned low surrogate
+        assertEquals("a", StringUtils.truncate(grin + "ab", 1, 2));

Review Comment:
   Added. `truncate(grin, ...)` is now covered across offset/maxWidth 
permutations: the whole pair when it fits (`0,2`, `0,3`), and empty when the 
cut or the offset lands inside it (`0,0`, `0,1`, `1,1`, `1,2`, `2,2`). Never a 
lone surrogate.



##########
src/test/java/org/apache/commons/lang3/StringUtilsAbbreviateTest.java:
##########
@@ -170,6 +171,31 @@ void testAbbreviate_StringStringIntInt() {
         assertEquals("....fg", StringUtils.abbreviate("abcdefg", "....", 5, 
6));
     }
 
+    @Test
+    void testAbbreviateSurrogatePair() {
+        // U+1F600 GRINNING FACE is a single supplementary code point stored 
as a surrogate pair
+        final String grin = "😀";
+        // the head cut backs off the pair so the marker is never preceded by 
a lone high surrogate
+        assertEquals("...", StringUtils.abbreviate(grin + "abcdef", 4));
+        assertEquals(grin + "...", StringUtils.abbreviate(grin + "abcdef", 5));
+        // a trailing supplementary code point is kept whole rather than 
sliced into a lone low surrogate
+        assertEquals("..." + grin, StringUtils.abbreviate("abcdef" + grin, 6, 
5));
+        // results stay within maxWidth and never contain an unpaired surrogate
+        for (int width = 4; width <= 8; width++) {
+            final String result = StringUtils.abbreviate("a" + grin + "b" + 
grin + "cd", width);
+            assertTrue(result.length() <= width, () -> "result longer than 
maxWidth: " + result);
+            for (int i = 0; i < result.length(); i++) {
+                final char ch = result.charAt(i);
+                if (Character.isHighSurrogate(ch)) {
+                    assertTrue(i + 1 < result.length() && 
Character.isLowSurrogate(result.charAt(i + 1)), "lone high surrogate in: " + 
result);
+                    i++; // skip the paired low surrogate
+                } else {
+                    assertFalse(Character.isLowSurrogate(ch), "lone low 
surrogate in: " + result);
+                }
+            }
+        }
+    }
+

Review Comment:
   Done. Added `abbreviate(grin, ...)` across the width, offset and marker 
permutations. The pair is kept whole in the normal cases, and the empty-marker 
case (`grin, "", 0, 1`) exercises the head cut backing off to empty instead of 
emitting a lone surrogate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to