ppkarwasz commented on code in PR #1327:
URL: https://github.com/apache/commons-lang/pull/1327#discussion_r1873900175
##########
src/main/java/org/apache/commons/lang3/StringUtils.java:
##########
@@ -2888,17 +2875,17 @@ public static int indexOfAnyBut(final CharSequence seq,
final CharSequence searc
if (isEmpty(seq) || isEmpty(searchChars)) {
return INDEX_NOT_FOUND;
}
- final int strLen = seq.length();
- for (int i = 0; i < strLen; i++) {
- final char ch = seq.charAt(i);
- final boolean chFound = CharSequenceUtils.indexOf(searchChars, ch,
0) >= 0;
- if (i + 1 < strLen && Character.isHighSurrogate(ch)) {
- final char ch2 = seq.charAt(i + 1);
- if (chFound && CharSequenceUtils.indexOf(searchChars, ch2, 0)
< 0) {
- return i;
- }
- } else if (!chFound) {
- return i;
+ final Set<Integer> searchSetCodePoints =
searchChars.codePoints().boxed()
+ .collect(Collectors.toSet()); // JDK >=10:
Collectors::toUnmodifiableSet
+ for (final ListIterator<Integer> seqListIt =
seq.chars().boxed().collect(Collectors.toList()) // JDK >=16: Stream::toList,
JDK >=10: Collectors::toUnmodifiableList
+ .listIterator(); seqListIt.hasNext(); seqListIt.next()) {
+ final int curSeqCharIdx = seqListIt.nextIndex();
+ final int curSeqCodePoint = Character.codePointAt(seq,
curSeqCharIdx);
+ if (!searchSetCodePoints.contains(curSeqCodePoint)) {
+ return curSeqCharIdx;
+ }
+ if (Character.isSupplementaryCodePoint(curSeqCodePoint)) {
+ seqListIt.next(); // skip subsequent low-surrogate in next
loop, since it merged into curSeqCodePoint
Review Comment:
```suggestion
int seqLength = seq.length();
int curSeqCodePoint;
// Skips the second character of a surrogate pair
for (curSeqCharIdx = 0; curSeqCharIdx < seqLength; curSeqCharIdx +=
Character.charCount(curSeqCodePoint)) {
curSeqCodePoint = Character.codePointAt(seq, curSeqCharIdx);
if (!searchSetCodePoints.contains(curSeqCodePoint)) {
return curSeqCharIdx;
}
```
Using a `ListIterator` has IMHO the same disadvantage of creating sets: you
need first to create a `List` (i.e. consume all the code points in `seq`) and
only then you start to iterate. I don't see why we can't use a simple loop.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]