[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801182#action_12801182
 ] 

Yonik Seeley commented on LUCENE-2216:
--

Thanks Dawid!

hashCode and equals probably shouldn't be modifying the state of the object 
though, right?
It's also not thread safe, so a lot of weird things could happen... the 
simplest example is that two threads could check that the last word is all 
zeros and both decrement wlen.

I like the spirit of your change though, as it only adds to the cost of 
hashCode/equals (which are already very expensive with large bitsets and should 
be avoided if possible anyway).

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801195#action_12801195
 ] 

Dawid Weiss commented on LUCENE-2216:
-

Hi Yonik,

This class is not thread-safe anyway (there are no memory barriers of any kind 
anywhere in the code).

From a single-thread perspective, yes, you are modifying the internal state of 
this object, but it's not really affecting anything other than possibly 
speeding up further interaction with this object (any other operation no 
OpenBitSets is affected by the value inside wlen).

Your patch also solves the issue, of course. I just don't see the point in 
_not_ updating wlen since you're scanning through memory anyway... The 
implementation of OpenBitSet is different in this regard to java.util.BitSet, 
which always maintains the last non-empty index. I've been thinking about it a 
bit and there are pros and cons to both implementations, but lazily moving wlen 
when memory is scanned anyway seems like a better alternative than keeping wlen 
unnecessarily large (which affects ORs, ANDs and other set operations).

To me this implementation cannot be used in a multi-threaded application 
anyway, am I wrong here?

D.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801221#action_12801221
 ] 

Dawid Weiss commented on LUCENE-2216:
-

This is only true if there is happens-before between the reads and the 
modifications to the object. In any other case other threads may be reading 
stale values (i.e., from their own cache), at least if my understanding of the 
jmm is correct here. Whether you want to rely on such a deep semantics of 
interaction between threads is something to consider deeply, at least in my 
personal opinion.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230
 ] 

Dawid Weiss commented on LUCENE-2216:
-

This is not entirely what I had in mind (it's not cache, but HotSpot 
optimisation), but similar situation applies (the value of the field that's 
never modified from the perspective of the current thread is never re-read).

{code}
public class Example10 {
private static Holder holder;

public static void startThread() {
new Thread() {
public void run() {
try { sleep(2000); } catch (Exception e) { /* ignore */ }
holder.ready = true;
System.out.println(Setting ready to true.);
}
}.start();
}

public static void main(String [] args) {
holder = new Holder();
startThread();
while (!holder.ready) {
// Do nothing.
}
System.out.println(I'm ready.);
}
}

class Holder {
public boolean ready;
}
{/code}

If you run it with -server, it will (should... or does on two machines I own) 
deadlock. Client mode and interpreted mode are not optimized, so it passes.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801235#action_12801235
 ] 

Yonik Seeley commented on LUCENE-2216:
--

bq. This is only true if there is happens-before between the reads and the 
modifications to the object. 

Of course... I said may be safely shared', not that any method one chooses to 
share it is correct.
It still seems that promoting hashCode and equals to mutating operations is 
wrong, no?

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801240#action_12801240
 ] 

Dawid Weiss commented on LUCENE-2216:
-

uff, I started having doubts in my own understanding, thanks for being patient 
with me.

I agree that having hashCode mutate the object's state is weird. I had some 
thoughts about it -- this particular mutation seems to be safe even from 
multi-threaded point of view. If another thread sees a stale value of wlen, 
then the only thing that is going to happen is it will scan more memory; for 
ands, ors and other types of operations this will have no effect. So assuming 
hashCode/equals is the ONLY method you're calling concurrently, it shouldn't 
break things. A similar kind of trickery goes on in String#hashCode (caching to 
a non-volatile field), although that object is immutable, so it's a slightly 
different scenario.

To be honest, my preference for this would be to either maintain the wlen field 
during all operations (like java.util.BitSet) or at least to clearly state 
(JavaDoc?) that trimTrailingZeros() should be invoked prior to publishing the 
object for other threads for increased performance (in case you fiddle with 
bits and clear the tail). In the second options, your patch does a fine job of 
not mutating the object and correcting the bug.

Thanks for an interesting discussion.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801263#action_12801263
 ] 

Dawid Weiss commented on LUCENE-2216:
-

Chances of this happening are really slim (this would probably be a single 
inlined read as soon as the compilation takes place, but you're right in the 
general case. I am not arguing changing the object in hashCode is good -- my 
argument is that ideally it should be fixed elsewhere (as in my previous 
suggestion -- either updating wlen every time the tail changes, or make 
explicit changes to the documentation that inform about suboptimal performance 
for zero-tailed sets).

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801265#action_12801265
 ] 

Dawid Weiss commented on LUCENE-2216:
-

For what it's worth, I checked the mentioned BitUtil methods -- ntz/pop; the 
same implementation is included from Java 1.5 upward. Do you want me to file 
another patch for this, Yonik, or are we leaving this as-is? I'd redirect from 
BitUtil to Long/Integer, deprecate BitUtil methods and replace the places in 
the code where they are used.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801266#action_12801266
 ] 

Yonik Seeley commented on LUCENE-2216:
--

bq. my argument is that ideally it should be fixed elsewhere

This is an expert-level class... I don't think that every call to clear() 
should be checking if it completely cleared the last word.  It's easy enough to 
call trimTrailingZeros after you did a bunch of modifications... but not so 
easy to regain the lost performance for the code doing redundant checking you 
didn't want.


 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801269#action_12801269
 ] 

Dawid Weiss commented on LUCENE-2216:
-

Ok, argument accepted.

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801270#action_12801270
 ] 

Yonik Seeley commented on LUCENE-2216:
--

bq. For what it's worth, I checked the mentioned BitUtil methods - ntz/pop; the 
same implementation is included from Java 1.5 upward.

Huh - I didn't realize that Java5 had the same pop impl as I did... it will be 
cool if it finally starts using native POPCNT instructions.

As far as ntz, I went though a lot of micro-optimizations and different 
implementations before I settled on the one used in BitUtil, so it would be 
nice to do some benchmarks to see if it's truly faster now (and also what the 
performance difference is for users of JVMs before this optimization was 
implemented).

 OpenBitSet#hashCode() may return false for identical sets.
 --

 Key: LUCENE-2216
 URL: https://issues.apache.org/jira/browse/LUCENE-2216
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Dawid Weiss
Priority: Minor
 Attachments: LUCENE-2216.patch, openbitset.patch


 OpenBitSet uses an internal buffer of long variables to store set bits and an 
 additional 'wlen' index that points 
 to the highest used component inside {...@link #bits} buffer.
 Unlike in JDK, the wlen field is not continuously maintained (on clearing 
 bits, for example). This leads to a situation when wlen may point
 far beyond the last set bit. 
 The hashCode implementation iterates over all long components of the bits 
 buffer, rotating the hash even for empty components. This is against the 
 contract of hashCode-equals. The following test case illustrates this:
 {code}
 // initialize two bitsets with different capacity (bits length).
 BitSet bs1 = new BitSet(200);
 BitSet bs2 = new BitSet(64);
 // set the same bit.
 bs1.set(3);
 bs2.set(3);
 
 // equals returns true (passes).
 assertEquals(bs1, bs2);
 // hashCode returns false (against contract).
 assertEquals(bs1.hashCode(), bs2.hashCode());
 {code}
 Fix and test case attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org