[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801249#action_12801249
 ] 

Yonik Seeley commented on LUCENE-2216:
--------------------------------------

bq. I agree that having hashCode mutate the object's state is weird. I had some 
thoughts about it - this particular mutation seems to be "safe" even from 
multi-threaded point of view. If another thread sees a stale value of wlen, 
then the only thing that is going to happen is it will scan more memory;

There are still quite a few things that can go wrong I think.  If all threads 
*only* called hashCode and equals, then you *might* be right... it's very 
specific to the implementation of trimTrailingZeros()
{code}
   public void trimTrailingZeros() {
    int idx = wlen-1;
    while (idx>=0 && bits[idx]==0) idx--;
    wlen = idx+1;
  }
{code}
What could make that work is the fact that wlen is an integer, is never 
directly used as the loop counter, or as an index into the array.

But the other big questions: are other read operations tolerant of wlen 
changing out from under them?  My guess would be no.
Look at xorCount for example:
{code}
    if (a.wlen < b.wlen) {
      tot += BitUtil.pop_array(b.bits, a.wlen, b.wlen-a.wlen);
{code}
hashCode and equals changing wlen could cause a negative value to be passed to 
pop_array.

Yet another problem: say someone actually does want to change the set 
occasionally.  One way they could safely do this is to use a read-write lock to 
allow any number of readers to read the set as long as a writer wasn't writing 
it.  But equals and hashCode would need to be categorized under "write" methods 
for this to work... (definitely unexpected) otherwise all sorts of bad stuff 
would happen.


> OpenBitSet#hashCode() may return false for identical sets.
> ----------------------------------------------------------
>
>                 Key: LUCENE-2216
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2216
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Other
>    Affects Versions: 2.9, 2.9.1, 3.0
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-2216.patch, openbitset.patch
>
>
> OpenBitSet uses an internal buffer of long variables to store set bits and an 
> additional 'wlen' index that points 
> to the highest used component inside {...@link #bits} buffer.
> Unlike in JDK, the wlen field is not continuously maintained (on clearing 
> bits, for example). This leads to a situation when wlen may point
> far beyond the last set bit. 
> The hashCode implementation iterates over all long components of the bits 
> buffer, rotating the hash even for empty components. This is against the 
> contract of hashCode-equals. The following test case illustrates this:
> {code}
> // initialize two bitsets with different capacity (bits length).
> BitSet bs1 = new BitSet(200);
> BitSet bs2 = new BitSet(64);
> // set the same bit.
> bs1.set(3);
> bs2.set(3);
>         
> // equals returns true (passes).
> assertEquals(bs1, bs2);
> // hashCode returns false (against contract).
> assertEquals(bs1.hashCode(), bs2.hashCode());
> {code}
> Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to