IcoreE opened a new pull request, #1524:
URL: https://github.com/apache/commons-lang/pull/1524
**1. Background**
The CharRange class (package: org.apache.commons.lang3) is a core component
of the CharSet utility, and its hashCode() method is frequently invoked in
scenarios such as HashMap/HashSet storage. The current implementation of
hashCode() (linear combination) has severe hash collision problems and
suboptimal calculation efficiency, which affects the performance of upper-layer
applications.
```
//Current hashCode implementation
@Override
public int hashCode() {
return 83 + start + 7 * end + (negated ? 1 : 0);
}
```
**2. Problem Description**
The current hashCode() implementation has two critical issues:
- **High collision rate**: The linear combination (83 + start + 7 * end +
flag) easily causes hash value offset. For example:
1. CharRange.isNotIn((char) 1, (char) 2) (start=1, end=2, negated=true) →
hash = 83+1+14+1=99
2. CharRange.isIn((char) 2, (char) 2) (start=2, end=2, negated=false) →
hash = 83+2+14=99
These two logically distinct instances (equals() returns false) have the
same hash code, leading to serious bucket conflicts in HashMap.
- **Low calculation efficiency:** Relies on multiple arithmetic operations
(1 multiplication + 3 additions), which are slower than bitwise operations and
fail to utilize the 32-bit space of int efficiently.
```
// hash collision cases
@Test
void testHashCodeCollision() {
// case A:hash=99
CharRange a1 = CharRange.isNotIn((char)1, (char)2); // 1,2,true →
83+1+14+1=99
CharRange a2 = CharRange.isIn((char)2, (char)2); // 2,2,false →
83+2+14+0=99
assertEquals(a1.hashCode(), a2.hashCode()); // Collision
// case B:hash=123
CharRange b1 = CharRange.isIn((char)5, (char)5); //5,5,false
→83+5+35+0=123
CharRange b2 = CharRange.isNotIn((char)4, (char)5); //4,5,true
→83+4+35+1=123
assertEquals(b1.hashCode(),b2.hashCode()); //Collision
}
```
**3. Proposed Solution**
Replace the current linear combination implementation with a bitwise
splicing + XOR flag scheme, which maximizes the use of 32-bit int space,
minimizes collision rate, and optimizes calculation efficiency:
```
@Override
public int hashCode() {
// Splice 16-bit start (high) and 16-bit end (low) into 32-bit int (no
overflow)
final int result = (start << 16) | (end & 0xFFFF);
// Integrate negated flag via XOR to further disperse hash values
return result ^ (negated ? 0x00010000 : 0);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]