alhudz commented on PR #1687:
URL: https://github.com/apache/commons-lang/pull/1687#issuecomment-4639508193

   Went back and swept the whole class (plus the `text.translate` package) for 
the same seam rather than eyeballing it.
   
   The only family in `StringUtils` that does a char-by-char surrogate-pair 
guard is `containsAny`/`containsNone`/`indexOfAny`/`indexOfAnyBut`. Three of 
those already agree: `containsAny` and `containsNone` use the canonical guard, 
and `indexOfAnyBut` routes through a `codePointAt`/`charCount` loop. The lone 
outlier is `indexOfAny(CharSequence, int, char...)`, which this PR brings into 
line.
   
   Everything else in the class that touches surrogates already advances by 
`Character.charCount` off a `codePointAt` (`swapCase`/`replaceChars` around 
line 8661, the `firstCodePoint` sites), so it's code-point correct by 
construction. Same story across `text.translate`: `CharSequenceTranslator`, 
`CodePointTranslator`, `JavaUnicodeEscaper` and `NumericEntityUnescaper` all 
advance by `charCount`. The one exception there is `LookupTranslator`, which is 
#1691.
   
   So the count is bounded rather than a swarm: the three seams I listed (#1684 
merged, this one, #1691), each with a reproducer. I can't promise some other 
char-array method elsewhere in Lang won't share the shape, but within the 
String and translate code where it'd actually bite, those are the three. Small 
enough to batch into the RC if you'd rather not carry them to a later 
maintenance release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to