On Wed, 18 Jan 2023 12:43:31 GMT, fabioromano1 <[email protected]> wrote:
>> The enanchment is useful for applications that make heavy use of BitSet
>> objects as sets of integers, and therefore they need to make a lot of calls
>> to cardinality() method, which actually require linear time in the number of
>> words in use by the bit set.
>> This optimization reduces the cost of calling cardinality() to constant
>> time, as it simply returns the value of the field, and it also try to make
>> as little effort as possible to update the field, when needed.
>>
>> Moreover, it has been implemented a new method for testing wheter a bit set
>> includes another bit set (i.e., the set of true bits of the parameter is a
>> subset of the true bits of the instance).
>
> fabioromano1 has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Added author and reverse the cicle order in includes(BitSet)
> On Thu, Jan 19, 2023 at 4:29 AM fabioromano1 ***@***.***> wrote: Like other
> reviewers, changing the performance tradeoffs in BitSet make me
> uncomfortable. 30 years of code has adapted to the current performance
> tradeoffs. Those users who really need O(1) cardinality() can fairly easily
> implement it in client code (and probably have). The spirit of BitSet is
> space efficiency, and adding another field negates that. cardinality() is
> O(N) but in practice should be very efficient due to linear traversal and
> (presumably) the use of hardware bitcount instructions for each word (fix
> that if not true). There may be a case for a BitSet rewrite (e.g. to support
> large sparse bitsets) once we have compact value types. Leaving to the client
> the responsibility of implementation isn't against the principle of
> information hiding and encapsulation?
> The client already has a pretty good implementation of cardinality(). One can
> see the effort the JVM engineers put into optimizing bitCounting arrays of
> longs. The proposed change is an example of something that happens a lot -
> something helps a particular use case while extracting a small but perpetual
> tax on all other users of the JDK, (here including future maintainers of
> BitSet). Adding a redundant field to BitSet increases the risk of not keeping
> the new field in sync. Are we sure we've tested that every mutating method
> correctly updates the new field? What about new methods added in future years?
> […](#)
> --- Just to show that we care about BitSet performance, here's a (start of a)
> change that I **would** make:
> --- a/src/java.base/share/classes/java/util/BitSet.java +++
> b/src/java.base/share/classes/java/util/BitSet.java @@ -897,6 +897,8 @@
> public class BitSet implements Cloneable, java.io.Serializable { * @SInCE 1.4
> */ public int cardinality() { + final int wordsInUse = this.wordsInUse; +
> final long[] words = this.words; int sum = 0; for (int i = 0; i < wordsInUse;
> i++) sum += Long.bitCount(words[i]); I bet that provides a measurable
> performance improvement, at least with some current VMs. But probably warmed
> up C2 doesn't need this help.
The patch has passed all existing tests of the class. Anyway, I decided to do
instead a subclass that implements this enhancement, in order to avoid creating
the problems shown here.
-------------
PR: https://git.openjdk.org/jdk/pull/11837