On Wed, 23 Aug 2023 18:51:37 GMT, Daniel Fuchs <[email protected]> wrote:
> The fast path that just returns the given string if ASCII-only and no
> encoding looks simple enough. I don't particularly like the idea of embedding
> the logic of encoding UTF-8 into that class though, that increases the
> complexity significantly, and Charset encoders are there for that. Also I
> don't understand the reason for changing BitSet into a boolean array - that
> seems gratuitous?
A perhaps key difference for performance between the `BitSet` and the
`boolean[]` in this code is that the latter is `static final @Stable` and thus
easy to optimize for the JIT. The `words` array held by a `BitSet` is neither
`final` nor `@Stable` so the JIT likely needs to keep a few extra checks around
every access.
An interesting experiment would be to instead model this as a `ConstantBitSet`
with a `final @Stable` internal array. This could get most (or all?) of the
benefit, keeping things at a higher abstraction level and allow for some reuse.
Retaining the compactness of `BitSet`s is nice too, though that might not be
very important for constant bit sets.
API would need to be worked out but something like add a public method
`BitSet::asConstant` and hiding away the details might be a good starting point:
public BitSet asConstant() {
return new ConstantBitSet(this);
}
private static class ConstantBitSet extends BitSet {
private @Stable final long[] words;
private ConstantBitSet(BitSet bitSet) {
words = Arrays.copyOf(bitSet.words);
}
// override all BitSet methods, make mutating methods throw
(IllegalStateException?)
// -- for a public API perhaps extract an interface
}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1691364488