Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

sebb Sat, 12 Dec 2009 09:04:39 -0800

On 12/12/2009, Vijay Menon <[email protected]> wrote:
> On Sat, Dec 12, 2009 at 7:34 AM, sebb <[email protected]> wrote:
>
>  > On 12/12/2009, Nathan Beyer <[email protected]> wrote:
>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <[email protected]>
>  > wrote:
>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
>  > don't
>  > >  >>>> believe that this:
>  > >  >>>>
>  > >  >>>>         if (hashCode == 0) {
>  > >  >>>>             // calculate hash
>  > >  >>>>             hashCode = hash;
>  > >  >>>>         }
>  > >  >>>>         return hashCode;
>  > >  >>>>
>  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of memory
>  > fences.
>  > >  >>>>  The JMM won't allow visible reordering in a single threaded
>  > program.
>  > >  >>> I agree.  In the multi-threaded case there can be a data race on the
>  > >  >>> hashCode, with the effect that the same hashCode value is
>  > unnecessarily,
>  > >  >>> but harmlessly, recalculated.
>  > >  >>
>  > >  >> Vijay, Tim, you are not 100% correct here.
>  > >  >>
>  > >  >> 1. there should be an extra restriction that the part "calculate
>  > hash"
>  > >  >>    does not touch the hashCode field. Without that restriction more
>  > >  >>    trivial races can happen as discussed in LANG-481.
>  > >  >>
>  > >  >> So we should assume this code:
>  > >  >>
>  > >  >> if (this.hashCode == 0) {
>  > >  >>   int hash;
>  > >  >>   if (this.hashCode == 0) {
>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>  > >  >>     this.hashCode = hash;
>  > >  >>   }
>  > >  >>   return this.hashCode;
>  > >  >> }
>  > >  >
>  > >  > Yes, I guess I figured that was assumed :-)
>  > >  >
>  > >  > Of course, there are lots of things you could put into the
>  > >  > "// Calculate..." section that would be unsafe.  We should stick with
>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>  > >  >
>  > >  >    public int hashCode() {
>  > >  >        if (hashCode == 0) {
>  > >  >            if (count == 0) {
>  > >  >                return 0;
>  > >  >            }
>  > >  >            int hash = 0;
>  > >  >            for (int i = offset; i < count + offset; i++) {
>  > >  >                hash = value[i] + ((hash << 5) - hash);
>  > >  >            }
>  > >  >            hashCode = hash;
>  > >  >        }
>  > >  >        return hashCode;
>  > >  >    }
>  > >  >
>  > >
>  > >
>  > > I think I understand the concern, after some additional reading. The
>  > >  issue seems to be that 'hashCode' is read twice and the field is not
>  > >  protected by any memory barriers (synchronized, volatile, etc). As
>  > >  such, it would be okay for the second read to be done using a cached
>  > >  value, which means that both reads could return 0 in the same thread
>  > >  of execution. Another way to look at it is that the write to
>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>  > >  is how I understand it.
>  > >
>  > >  Will that happen in practice? I have no idea. It does seem possible.
>  >
>  > The Java MM guarantees that a single thread behaves as if the code is
>  > processed sequentially.
>  > So if the thread writes non-zero to this.hashCode it cannot then
>  > return zero for the value of this.hashCode if no other threads
>  > intervene. The thread cannot ignore updates to values it has itself
>  > cached!
>  >
>  > If another thread writes to this.hashCode concurrently, then this
>  > thread may or may not see the value stored by that thread. In this
>  > case, it's not a problem, as another thread can only write a fixed
>  > value. So the worst that can happen is that this.hashCode is written
>  > more than once, and the current thread may fetch the value written by
>  > another thread. But this is the same value it wrote anyway.
>  >
>
>
> In a multithreaded setting, this code *can* break and return 0 if hashCode
>  is read twice.  This is not just a performance optimization - it is a
>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>  read operations.  The following racy execution is allowable under the JMM:
>
>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>  2. Thread 2 write 42 into hashCode.
>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>  5. Thread 1 returns t1 (0).
>


But why would Thread 1 read hashCode twice before the comparison?

Seems to me that would break the "as if serial" guarantee for a single thread.
In the code sequence, the comparison is before the return, and
therefore "happens-before" the return. I.e. step 3 "happens-before"
step 1+5.

I'm not saying Harmony should keep the current code - the suggested
temp variable version seems better anyway - just trying to understand
what (if anything) is currently broken.

>  - Vijay
>
>
>
>  >
>  > >  In any case, it does seem a pinch more efficient to only do one read
>  > >  of hashCode ... switch up the code to be something like this.
>  >
>  > Agreed.
>  >
>  > >  public int hashCode() {
>  > >     final int hash = hashCode;
>  > >     if (hash == 0) {
>  > >
>  > >         if (count == 0) {
>  > >             return 0;
>  > >         }
>  > >
>  > >         for (int i = offset; i < count + offset; i++) {
>  > >             hash = value[i] + ((hash << 5) - hash);
>  > >         }
>  > >         hashCode = hash;
>  > >     }
>  > >
>  > >     return hash;
>  > >  }
>  > >
>  > >  Thoughts?
>  > >
>  > >
>  > >  >> where 'this.*' is always a memory reference while 'hash' is a thread
>  > private var.
>  > >  >>
>  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>  > >  >>    always loads fields from memory in original order. Hence this
>  > >  >>    potential bug does not affect DRLVM .. now. But potentially it can
>  > >  >>    start optimizing things this way because current behavior prevents
>  > >  >>    a bunch of high level optimizations.
>  > >  >>
>  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
>  > that
>  > >  >>    such reordering is allowed theoretically. I.e. like this:
>  > >  >>
>  > >  >> int hash = this.hashCode;
>  > >  >> if (this.hashCode == 0) {
>  > >  >>   hash = this.hashCode = // calculate hash
>  > >  >> }
>  > >  >> return hash;
>  > >  >>
>  > >  >> This is a correct single-threaded code. What happened here is a
>  > >  >> lengthy thread privatization of this.hashCode (again, does not happen
>  > >  >> in DRLVM). That is incorrect in multithreaded environment and needs
>  > >  >> extra synchronization/volatile/etc.
>  > >  >>
>  > >  >> 4. I do not know why a JIT would want to do that, i.e. two sequential
>  > >  >>    reads from the same memory location. Sounds like a bit synthetic
>  > example.
>  > >  >
>  > >  > ...at which point a bunch of code probably would go wrong!  So
>  > hopefully
>  > >  > it remains only a theoretical possibility.
>  > >  >
>  > >  > Regards,
>  > >  > Tim
>  > >  >
>  > >  >> [1]
>  > http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>  > >  >>
>  > >  >
>  > >
>  >
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Reply via email to