On Wed, 23 Aug 2023 18:51:37 GMT, Daniel Fuchs <[email protected]> wrote:

> The fast path that just returns the given string if ASCII-only and no 
> encoding looks simple enough. I don't particularly like the idea of embedding 
> the logic of encoding UTF-8 into that class though, that increases the 
> complexity significantly, and Charset encoders are there for that. Also I 
> don't understand the reason for changing BitSet into a boolean array - that 
> seems gratuitous?

A perhaps key difference for performance between the `BitSet` and the 
`boolean[]` in this code is that the latter is `static final @Stable` and thus 
easy to optimize for the JIT. The `words` array held by a `BitSet` is neither 
`final` nor `@Stable` so the JIT likely needs to keep a few extra checks around 
every access.

An interesting experiment would be to instead model this as a `ConstantBitSet` 
with a `final @Stable` internal array. This could get most (or all?) of the 
benefit, keeping things at a higher abstraction level and allow for some reuse. 
Retaining the compactness of `BitSet`s is nice too, though that might not be 
very important for constant bit sets.

API would need to be worked out but something like add a public method 
`BitSet::asConstant` and hiding away the details might be a good starting point:

public BitSet asConstant() {
  return new ConstantBitSet(this);
}

private static class ConstantBitSet extends BitSet {
  private @Stable final long[] words;
  private ConstantBitSet(BitSet bitSet) {
    words = Arrays.copyOf(bitSet.words);
  }
  // override all BitSet methods, make mutating methods throw 
(IllegalStateException?) 
  // -- for a public API perhaps extract an interface
}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15354#issuecomment-1691364488

Reply via email to