ppkarwasz commented on code in PR #1327:
URL: https://github.com/apache/commons-lang/pull/1327#discussion_r1873900175


##########
src/main/java/org/apache/commons/lang3/StringUtils.java:
##########
@@ -2888,17 +2875,17 @@ public static int indexOfAnyBut(final CharSequence seq, 
final CharSequence searc
         if (isEmpty(seq) || isEmpty(searchChars)) {
             return INDEX_NOT_FOUND;
         }
-        final int strLen = seq.length();
-        for (int i = 0; i < strLen; i++) {
-            final char ch = seq.charAt(i);
-            final boolean chFound = CharSequenceUtils.indexOf(searchChars, ch, 
0) >= 0;
-            if (i + 1 < strLen && Character.isHighSurrogate(ch)) {
-                final char ch2 = seq.charAt(i + 1);
-                if (chFound && CharSequenceUtils.indexOf(searchChars, ch2, 0) 
< 0) {
-                    return i;
-                }
-            } else if (!chFound) {
-                return i;
+        final Set<Integer> searchSetCodePoints = 
searchChars.codePoints().boxed()
+                .collect(Collectors.toSet()); // JDK >=10: 
Collectors::toUnmodifiableSet
+        for (final ListIterator<Integer> seqListIt = 
seq.chars().boxed().collect(Collectors.toList()) // JDK >=16: Stream::toList, 
JDK >=10: Collectors::toUnmodifiableList
+                .listIterator(); seqListIt.hasNext(); seqListIt.next()) {
+            final int curSeqCharIdx = seqListIt.nextIndex();
+            final int curSeqCodePoint = Character.codePointAt(seq, 
curSeqCharIdx);
+            if (!searchSetCodePoints.contains(curSeqCodePoint)) {
+                return curSeqCharIdx;
+            }
+            if (Character.isSupplementaryCodePoint(curSeqCodePoint)) {
+                seqListIt.next(); // skip subsequent low-surrogate in next 
loop, since it merged into curSeqCodePoint

Review Comment:
   ```suggestion
           int seqLength = seq.length();
           int curSeqCodePoint;
           // Skips the second character of a surrogate pair
           for (curSeqCharIdx = 0; curSeqCharIdx < seqLength; curSeqCharIdx += 
Character.charCount(curSeqCodePoint)) {
               curSeqCodePoint = Character.codePointAt(seq, curSeqCharIdx);
               if (!searchSetCodePoints.contains(curSeqCodePoint)) {
                   return curSeqCharIdx;
               }
   ```
   
   Using a `ListIterator` has IMHO the same disadvantage of creating sets: you 
need first to create a `List` (i.e. consume all the code points in `seq`) and 
only then you start to iterate. I don't see why we can't use a simple loop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to