On Sat, Dec 12, 2009 at 7:34 AM, sebb <[email protected]> wrote:
> On 12/12/2009, Nathan Beyer <[email protected]> wrote:
> > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <[email protected]>
> wrote:
> > > On 11/Dec/2009 14:32, Egor Pasko wrote:
> > >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
> > >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
> > >>>> Perhaps I'm missing some context, but I don't see any problem. I
> don't
> > >>>> believe that this:
> > >>>>
> > >>>> if (hashCode == 0) {
> > >>>> // calculate hash
> > >>>> hashCode = hash;
> > >>>> }
> > >>>> return hashCode;
> > >>>>
> > >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
> fences.
> > >>>> The JMM won't allow visible reordering in a single threaded
> program.
> > >>> I agree. In the multi-threaded case there can be a data race on the
> > >>> hashCode, with the effect that the same hashCode value is
> unnecessarily,
> > >>> but harmlessly, recalculated.
> > >>
> > >> Vijay, Tim, you are not 100% correct here.
> > >>
> > >> 1. there should be an extra restriction that the part "calculate
> hash"
> > >> does not touch the hashCode field. Without that restriction more
> > >> trivial races can happen as discussed in LANG-481.
> > >>
> > >> So we should assume this code:
> > >>
> > >> if (this.hashCode == 0) {
> > >> int hash;
> > >> if (this.hashCode == 0) {
> > >> // Calculate using 'hash' only, not this.hashCode.
> > >> this.hashCode = hash;
> > >> }
> > >> return this.hashCode;
> > >> }
> > >
> > > Yes, I guess I figured that was assumed :-)
> > >
> > > Of course, there are lots of things you could put into the
> > > "// Calculate..." section that would be unsafe. We should stick with
> > > showing the non-abbreviated implementation to avoid ambiguity:
> > >
> > > public int hashCode() {
> > > if (hashCode == 0) {
> > > if (count == 0) {
> > > return 0;
> > > }
> > > int hash = 0;
> > > for (int i = offset; i < count + offset; i++) {
> > > hash = value[i] + ((hash << 5) - hash);
> > > }
> > > hashCode = hash;
> > > }
> > > return hashCode;
> > > }
> > >
> >
> >
> > I think I understand the concern, after some additional reading. The
> > issue seems to be that 'hashCode' is read twice and the field is not
> > protected by any memory barriers (synchronized, volatile, etc). As
> > such, it would be okay for the second read to be done using a cached
> > value, which means that both reads could return 0 in the same thread
> > of execution. Another way to look at it is that the write to
> > 'hashCode' may or may not affect subsequent reads of 'hashCode'. This
> > is how I understand it.
> >
> > Will that happen in practice? I have no idea. It does seem possible.
>
> The Java MM guarantees that a single thread behaves as if the code is
> processed sequentially.
> So if the thread writes non-zero to this.hashCode it cannot then
> return zero for the value of this.hashCode if no other threads
> intervene. The thread cannot ignore updates to values it has itself
> cached!
>
> If another thread writes to this.hashCode concurrently, then this
> thread may or may not see the value stored by that thread. In this
> case, it's not a problem, as another thread can only write a fixed
> value. So the worst that can happen is that this.hashCode is written
> more than once, and the current thread may fetch the value written by
> another thread. But this is the same value it wrote anyway.
>
In a multithreaded setting, this code *can* break and return 0 if hashCode
is read twice. This is not just a performance optimization - it is a
correctness issue. The compiler / runtime / hardware is allowed to reorder
read operations. The following racy execution is allowable under the JMM:
1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
2. Thread 2 write 42 into hashCode.
3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
5. Thread 1 returns t1 (0).
- Vijay
>
> > In any case, it does seem a pinch more efficient to only do one read
> > of hashCode ... switch up the code to be something like this.
>
> Agreed.
>
> > public int hashCode() {
> > final int hash = hashCode;
> > if (hash == 0) {
> >
> > if (count == 0) {
> > return 0;
> > }
> >
> > for (int i = offset; i < count + offset; i++) {
> > hash = value[i] + ((hash << 5) - hash);
> > }
> > hashCode = hash;
> > }
> >
> > return hash;
> > }
> >
> > Thoughts?
> >
> >
> > >> where 'this.*' is always a memory reference while 'hash' is a thread
> private var.
> > >>
> > >> 2. DRLVM JIT indeed does not privatize field access to threads. It
> > >> always loads fields from memory in original order. Hence this
> > >> potential bug does not affect DRLVM .. now. But potentially it can
> > >> start optimizing things this way because current behavior prevents
> > >> a bunch of high level optimizations.
> > >>
> > >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
> that
> > >> such reordering is allowed theoretically. I.e. like this:
> > >>
> > >> int hash = this.hashCode;
> > >> if (this.hashCode == 0) {
> > >> hash = this.hashCode = // calculate hash
> > >> }
> > >> return hash;
> > >>
> > >> This is a correct single-threaded code. What happened here is a
> > >> lengthy thread privatization of this.hashCode (again, does not happen
> > >> in DRLVM). That is incorrect in multithreaded environment and needs
> > >> extra synchronization/volatile/etc.
> > >>
> > >> 4. I do not know why a JIT would want to do that, i.e. two sequential
> > >> reads from the same memory location. Sounds like a bit synthetic
> example.
> > >
> > > ...at which point a bunch of code probably would go wrong! So
> hopefully
> > > it remains only a theoretical possibility.
> > >
> > > Regards,
> > > Tim
> > >
> > >> [1]
> http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
> > >>
> > >
> >
>