[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801182#action_12801182 ] Yonik Seeley commented on LUCENE-2216: -- Thanks Dawid! hashCode and equals probably shouldn't be modifying the state of the object though, right? It's also not thread safe, so a lot of weird things could happen... the simplest example is that two threads could check that the last word is all zeros and both decrement wlen. I like the spirit of your change though, as it only adds to the cost of hashCode/equals (which are already very expensive with large bitsets and should be avoided if possible anyway). OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801195#action_12801195 ] Dawid Weiss commented on LUCENE-2216: - Hi Yonik, This class is not thread-safe anyway (there are no memory barriers of any kind anywhere in the code). From a single-thread perspective, yes, you are modifying the internal state of this object, but it's not really affecting anything other than possibly speeding up further interaction with this object (any other operation no OpenBitSets is affected by the value inside wlen). Your patch also solves the issue, of course. I just don't see the point in _not_ updating wlen since you're scanning through memory anyway... The implementation of OpenBitSet is different in this regard to java.util.BitSet, which always maintains the last non-empty index. I've been thinking about it a bit and there are pros and cons to both implementations, but lazily moving wlen when memory is scanned anyway seems like a better alternative than keeping wlen unnecessarily large (which affects ORs, ANDs and other set operations). To me this implementation cannot be used in a multi-threaded application anyway, am I wrong here? D. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801221#action_12801221 ] Dawid Weiss commented on LUCENE-2216: - This is only true if there is happens-before between the reads and the modifications to the object. In any other case other threads may be reading stale values (i.e., from their own cache), at least if my understanding of the jmm is correct here. Whether you want to rely on such a deep semantics of interaction between threads is something to consider deeply, at least in my personal opinion. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230 ] Dawid Weiss commented on LUCENE-2216: - This is not entirely what I had in mind (it's not cache, but HotSpot optimisation), but similar situation applies (the value of the field that's never modified from the perspective of the current thread is never re-read). {code} public class Example10 { private static Holder holder; public static void startThread() { new Thread() { public void run() { try { sleep(2000); } catch (Exception e) { /* ignore */ } holder.ready = true; System.out.println(Setting ready to true.); } }.start(); } public static void main(String [] args) { holder = new Holder(); startThread(); while (!holder.ready) { // Do nothing. } System.out.println(I'm ready.); } } class Holder { public boolean ready; } {/code} If you run it with -server, it will (should... or does on two machines I own) deadlock. Client mode and interpreted mode are not optimized, so it passes. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801235#action_12801235 ] Yonik Seeley commented on LUCENE-2216: -- bq. This is only true if there is happens-before between the reads and the modifications to the object. Of course... I said may be safely shared', not that any method one chooses to share it is correct. It still seems that promoting hashCode and equals to mutating operations is wrong, no? OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801240#action_12801240 ] Dawid Weiss commented on LUCENE-2216: - uff, I started having doubts in my own understanding, thanks for being patient with me. I agree that having hashCode mutate the object's state is weird. I had some thoughts about it -- this particular mutation seems to be safe even from multi-threaded point of view. If another thread sees a stale value of wlen, then the only thing that is going to happen is it will scan more memory; for ands, ors and other types of operations this will have no effect. So assuming hashCode/equals is the ONLY method you're calling concurrently, it shouldn't break things. A similar kind of trickery goes on in String#hashCode (caching to a non-volatile field), although that object is immutable, so it's a slightly different scenario. To be honest, my preference for this would be to either maintain the wlen field during all operations (like java.util.BitSet) or at least to clearly state (JavaDoc?) that trimTrailingZeros() should be invoked prior to publishing the object for other threads for increased performance (in case you fiddle with bits and clear the tail). In the second options, your patch does a fine job of not mutating the object and correcting the bug. Thanks for an interesting discussion. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801263#action_12801263 ] Dawid Weiss commented on LUCENE-2216: - Chances of this happening are really slim (this would probably be a single inlined read as soon as the compilation takes place, but you're right in the general case. I am not arguing changing the object in hashCode is good -- my argument is that ideally it should be fixed elsewhere (as in my previous suggestion -- either updating wlen every time the tail changes, or make explicit changes to the documentation that inform about suboptimal performance for zero-tailed sets). OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801265#action_12801265 ] Dawid Weiss commented on LUCENE-2216: - For what it's worth, I checked the mentioned BitUtil methods -- ntz/pop; the same implementation is included from Java 1.5 upward. Do you want me to file another patch for this, Yonik, or are we leaving this as-is? I'd redirect from BitUtil to Long/Integer, deprecate BitUtil methods and replace the places in the code where they are used. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801266#action_12801266 ] Yonik Seeley commented on LUCENE-2216: -- bq. my argument is that ideally it should be fixed elsewhere This is an expert-level class... I don't think that every call to clear() should be checking if it completely cleared the last word. It's easy enough to call trimTrailingZeros after you did a bunch of modifications... but not so easy to regain the lost performance for the code doing redundant checking you didn't want. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801269#action_12801269 ] Dawid Weiss commented on LUCENE-2216: - Ok, argument accepted. OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801270#action_12801270 ] Yonik Seeley commented on LUCENE-2216: -- bq. For what it's worth, I checked the mentioned BitUtil methods - ntz/pop; the same implementation is included from Java 1.5 upward. Huh - I didn't realize that Java5 had the same pop impl as I did... it will be cool if it finally starts using native POPCNT instructions. As far as ntz, I went though a lot of micro-optimizations and different implementations before I settled on the one used in BitUtil, so it would be nice to do some benchmarks to see if it's truly faster now (and also what the performance difference is for users of JVMs before this optimization was implemented). OpenBitSet#hashCode() may return false for identical sets. -- Key: LUCENE-2216 URL: https://issues.apache.org/jira/browse/LUCENE-2216 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.9, 2.9.1, 3.0 Reporter: Dawid Weiss Priority: Minor Attachments: LUCENE-2216.patch, openbitset.patch OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer. Unlike in JDK, the wlen field is not continuously maintained (on clearing bits, for example). This leads to a situation when wlen may point far beyond the last set bit. The hashCode implementation iterates over all long components of the bits buffer, rotating the hash even for empty components. This is against the contract of hashCode-equals. The following test case illustrates this: {code} // initialize two bitsets with different capacity (bits length). BitSet bs1 = new BitSet(200); BitSet bs2 = new BitSet(64); // set the same bit. bs1.set(3); bs2.set(3); // equals returns true (passes). assertEquals(bs1, bs2); // hashCode returns false (against contract). assertEquals(bs1.hashCode(), bs2.hashCode()); {code} Fix and test case attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org