alhudz commented on PR #1687: URL: https://github.com/apache/commons-lang/pull/1687#issuecomment-4639508193
Went back and swept the whole class (plus the `text.translate` package) for the same seam rather than eyeballing it. The only family in `StringUtils` that does a char-by-char surrogate-pair guard is `containsAny`/`containsNone`/`indexOfAny`/`indexOfAnyBut`. Three of those already agree: `containsAny` and `containsNone` use the canonical guard, and `indexOfAnyBut` routes through a `codePointAt`/`charCount` loop. The lone outlier is `indexOfAny(CharSequence, int, char...)`, which this PR brings into line. Everything else in the class that touches surrogates already advances by `Character.charCount` off a `codePointAt` (`swapCase`/`replaceChars` around line 8661, the `firstCodePoint` sites), so it's code-point correct by construction. Same story across `text.translate`: `CharSequenceTranslator`, `CodePointTranslator`, `JavaUnicodeEscaper` and `NumericEntityUnescaper` all advance by `charCount`. The one exception there is `LookupTranslator`, which is #1691. So the count is bounded rather than a swarm: the three seams I listed (#1684 merged, this one, #1691), each with a reproducer. I can't promise some other char-array method elsewhere in Lang won't share the shape, but within the String and translate code where it'd actually bite, those are the three. Small enough to batch into the RC if you'd rather not carry them to a later maintenance release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
