On the 0x685 day of Apache Harmony Nathan Beyer wrote: > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <t.p.elli...@gmail.com> wrote: >> On 11/Dec/2009 14:32, Egor Pasko wrote: >>> On the 0x684 day of Apache Harmony Tim Ellison wrote: >>>> On 11/Dec/2009 04:09, Vijay Menon wrote: >>>>> Perhaps I'm missing some context, but I don't see any problem. I don't >>>>> believe that this: >>>>> >>>>> if (hashCode == 0) { >>>>> // calculate hash >>>>> hashCode = hash; >>>>> } >>>>> return hashCode; >>>>> >>>>> can ever return 0 (assuming hash is non-zero), regardless of memory >>>>> fences. >>>>> The JMM won't allow visible reordering in a single threaded program. >>>> I agree. In the multi-threaded case there can be a data race on the >>>> hashCode, with the effect that the same hashCode value is unnecessarily, >>>> but harmlessly, recalculated. >>> >>> Vijay, Tim, you are not 100% correct here. >>> >>> 1. there should be an extra restriction that the part "calculate hash" >>> does not touch the hashCode field. Without that restriction more >>> trivial races can happen as discussed in LANG-481. >>> >>> So we should assume this code: >>> >>> if (this.hashCode == 0) { >>> int hash; >>> if (this.hashCode == 0) { >>> // Calculate using 'hash' only, not this.hashCode. >>> this.hashCode = hash; >>> } >>> return this.hashCode; >>> } >> >> Yes, I guess I figured that was assumed :-) >> >> Of course, there are lots of things you could put into the >> "// Calculate..." section that would be unsafe. We should stick with >> showing the non-abbreviated implementation to avoid ambiguity: >> >> public int hashCode() { >> if (hashCode == 0) { >> if (count == 0) { >> return 0; >> } >> int hash = 0; >> for (int i = offset; i < count + offset; i++) { >> hash = value[i] + ((hash << 5) - hash); >> } >> hashCode = hash; >> } >> return hashCode; >> } >> > > I think I understand the concern, after some additional reading. The > issue seems to be that 'hashCode' is read twice and the field is not > protected by any memory barriers (synchronized, volatile, etc). As > such, it would be okay for the second read to be done using a cached > value, which means that both reads could return 0 in the same thread > of execution. Another way to look at it is that the write to > 'hashCode' may or may not affect subsequent reads of 'hashCode'. This > is how I understand it. > > Will that happen in practice? I have no idea. It does seem possible. > > In any case, it does seem a pinch more efficient to only do one read > of hashCode ... switch up the code to be something like this. > > public int hashCode() { > final int hash = hashCode; > if (hash == 0) { > if (count == 0) { > return 0; > } > for (int i = offset; i < count + offset; i++) { > hash = value[i] + ((hash << 5) - hash); > } > hashCode = hash;
one more 'return hash' here, please :) > } > return hash; > } > > Thoughts? -- Egor Pasko