Re: RFR: 8300487: Store cardinality as a field in BitSet [v5]

fabioromano1 Thu, 19 Jan 2023 13:19:15 -0800

On Wed, 18 Jan 2023 12:43:31 GMT, fabioromano1 <d...@openjdk.org> wrote:


>> The enanchment is useful for applications that make heavy use of BitSet 
>> objects as sets of integers, and therefore they need to make a lot of calls 
>> to cardinality() method, which actually require linear time in the number of 
>> words in use by the bit set.
>> This optimization reduces the cost of calling cardinality() to constant 
>> time, as it simply returns the value of the field, and it also try to make 
>> as little effort as possible to update the field, when needed.
>> 
>> Moreover, it has been implemented a new method for testing wheter a bit set 
>> includes another bit set (i.e., the set of true bits of the parameter is a 
>> subset of the true bits of the instance).
>
> fabioromano1 has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Added author and reverse the cicle order in includes(BitSet)

> On Thu, Jan 19, 2023 at 4:29 AM fabioromano1 ***@***.***> wrote: Like other 
> reviewers, changing the performance tradeoffs in BitSet make me 
> uncomfortable. 30 years of code has adapted to the current performance 
> tradeoffs. Those users who really need O(1) cardinality() can fairly easily 
> implement it in client code (and probably have). The spirit of BitSet is 
> space efficiency, and adding another field negates that. cardinality() is 
> O(N) but in practice should be very efficient due to linear traversal and 
> (presumably) the use of hardware bitcount instructions for each word (fix 
> that if not true). There may be a case for a BitSet rewrite (e.g. to support 
> large sparse bitsets) once we have compact value types. Leaving to the client 
> the responsibility of implementation isn't against the principle of 
> information hiding and encapsulation?
> The client already has a pretty good implementation of cardinality(). One can 
> see the effort the JVM engineers put into optimizing bitCounting arrays of 
> longs. The proposed change is an example of something that happens a lot - 
> something helps a particular use case while extracting a small but perpetual 
> tax on all other users of the JDK, (here including future maintainers of 
> BitSet). Adding a redundant field to BitSet increases the risk of not keeping 
> the new field in sync. Are we sure we've tested that every mutating method 
> correctly updates the new field? What about new methods added in future years?
> […](#)
> --- Just to show that we care about BitSet performance, here's a (start of a) 
> change that I **would** make:
> --- a/src/java.base/share/classes/java/util/BitSet.java +++ 
> b/src/java.base/share/classes/java/util/BitSet.java @@ -897,6 +897,8 @@ 
> public class BitSet implements Cloneable, java.io.Serializable { * @SInCE 1.4 
> */ public int cardinality() { + final int wordsInUse = this.wordsInUse; + 
> final long[] words = this.words; int sum = 0; for (int i = 0; i < wordsInUse; 
> i++) sum += Long.bitCount(words[i]); I bet that provides a measurable 
> performance improvement, at least with some current VMs. But probably warmed 
> up C2 doesn't need this help.

The patch has passed all existing tests of the class. Anyway, I decided to do 
instead a subclass that implements this enhancement, in order to avoid creating 
the problems shown here.

-------------

PR: https://git.openjdk.org/jdk/pull/11837

Re: RFR: 8300487: Store cardinality as a field in BitSet [v5]

Reply via email to