Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Nathan Beyer Sat, 12 Dec 2009 18:29:35 -0800

On Sat, Dec 12, 2009 at 11:04 AM, sebb <[email protected]> wrote:
> On 12/12/2009, Vijay Menon <[email protected]> wrote:
>> On Sat, Dec 12, 2009 at 7:34 AM, sebb <[email protected]> wrote:
>>
>>  > On 12/12/2009, Nathan Beyer <[email protected]> wrote:
>>  > > On Fri, Dec 11, 2009 at 10:04 AM, Tim Ellison <[email protected]>
>>  > wrote:
>>  > >  > On 11/Dec/2009 14:32, Egor Pasko wrote:
>>  > >  >> On the 0x684 day of Apache Harmony Tim Ellison wrote:
>>  > >  >>> On 11/Dec/2009 04:09, Vijay Menon wrote:
>>  > >  >>>> Perhaps I'm missing some context, but I don't see any problem.  I
>>  > don't
>>  > >  >>>> believe that this:
>>  > >  >>>>
>>  > >  >>>>         if (hashCode == 0) {
>>  > >  >>>>             // calculate hash
>>  > >  >>>>             hashCode = hash;
>>  > >  >>>>         }
>>  > >  >>>>         return hashCode;
>>  > >  >>>>
>>  > >  >>>> can ever return 0 (assuming hash is non-zero), regardless of 
>> memory
>>  > fences.
>>  > >  >>>>  The JMM won't allow visible reordering in a single threaded
>>  > program.
>>  > >  >>> I agree.  In the multi-threaded case there can be a data race on 
>> the
>>  > >  >>> hashCode, with the effect that the same hashCode value is
>>  > unnecessarily,
>>  > >  >>> but harmlessly, recalculated.
>>  > >  >>
>>  > >  >> Vijay, Tim, you are not 100% correct here.
>>  > >  >>
>>  > >  >> 1. there should be an extra restriction that the part "calculate
>>  > hash"
>>  > >  >>    does not touch the hashCode field. Without that restriction more
>>  > >  >>    trivial races can happen as discussed in LANG-481.
>>  > >  >>
>>  > >  >> So we should assume this code:
>>  > >  >>
>>  > >  >> if (this.hashCode == 0) {
>>  > >  >>   int hash;
>>  > >  >>   if (this.hashCode == 0) {
>>  > >  >>     // Calculate using 'hash' only, not this.hashCode.
>>  > >  >>     this.hashCode = hash;
>>  > >  >>   }
>>  > >  >>   return this.hashCode;
>>  > >  >> }
>>  > >  >
>>  > >  > Yes, I guess I figured that was assumed :-)
>>  > >  >
>>  > >  > Of course, there are lots of things you could put into the
>>  > >  > "// Calculate..." section that would be unsafe.  We should stick with
>>  > >  > showing the non-abbreviated implementation to avoid ambiguity:
>>  > >  >
>>  > >  >    public int hashCode() {
>>  > >  >        if (hashCode == 0) {
>>  > >  >            if (count == 0) {
>>  > >  >                return 0;
>>  > >  >            }
>>  > >  >            int hash = 0;
>>  > >  >            for (int i = offset; i < count + offset; i++) {
>>  > >  >                hash = value[i] + ((hash << 5) - hash);
>>  > >  >            }
>>  > >  >            hashCode = hash;
>>  > >  >        }
>>  > >  >        return hashCode;
>>  > >  >    }
>>  > >  >
>>  > >
>>  > >
>>  > > I think I understand the concern, after some additional reading. The
>>  > >  issue seems to be that 'hashCode' is read twice and the field is not
>>  > >  protected by any memory barriers (synchronized, volatile, etc). As
>>  > >  such, it would be okay for the second read to be done using a cached
>>  > >  value, which means that both reads could return 0 in the same thread
>>  > >  of execution. Another way to look at it is that the write to
>>  > >  'hashCode' may or may not affect subsequent reads of 'hashCode'. This
>>  > >  is how I understand it.
>>  > >
>>  > >  Will that happen in practice? I have no idea. It does seem possible.
>>  >
>>  > The Java MM guarantees that a single thread behaves as if the code is
>>  > processed sequentially.
>>  > So if the thread writes non-zero to this.hashCode it cannot then
>>  > return zero for the value of this.hashCode if no other threads
>>  > intervene. The thread cannot ignore updates to values it has itself
>>  > cached!
>>  >
>>  > If another thread writes to this.hashCode concurrently, then this
>>  > thread may or may not see the value stored by that thread. In this
>>  > case, it's not a problem, as another thread can only write a fixed
>>  > value. So the worst that can happen is that this.hashCode is written
>>  > more than once, and the current thread may fetch the value written by
>>  > another thread. But this is the same value it wrote anyway.
>>  >
>>
>>
>> In a multithreaded setting, this code *can* break and return 0 if hashCode
>>  is read twice.  This is not just a performance optimization - it is a
>>  correctness issue.  The compiler / runtime / hardware is allowed to reorder
>>  read operations.  The following racy execution is allowable under the JMM:
>>
>>  1. Thread 1 reads 0 from hashCode and stores 0 into a local (t1).
>>  2. Thread 2 write 42 into hashCode.
>>  3. Thread 1 reads 42 from hashCode and stores 42 into a local (t2).
>>  4. Thread 1 compares t2 (42) with 0 and does not execute the if clause.
>>  5. Thread 1 returns t1 (0).
>>
>
> But why would Thread 1 read hashCode twice before the comparison?
>
> Seems to me that would break the "as if serial" guarantee for a single thread.
> In the code sequence, the comparison is before the return, and
> therefore "happens-before" the return. I.e. step 3 "happens-before"
> step 1+5.
>
> I'm not saying Harmony should keep the current code - the suggested
> temp variable version seems better anyway - just trying to understand
> what (if anything) is currently broken.


I'm still open to counter arguments as it still seems weird. I keep
focusing on the bit that 'this.hashCode' is referenced twice as a
'read' - "if (hashCode == 0)" and "return hashCode". Since 'hashCode'
isn't final, volatile or in a syncrhonized region, the read into the
stack could be cached. As I understand it, it's not the reorder of the
Java code, it's the reorder of the generated code that just reads the
value from the heap to the stack.

I'm basing this off of the 'racy single-check' idiom that's mentioned
in Effective Java. I'd like to get a complete answer to this though.

-Nathan

>
>>  - Vijay
>>
>>
>>
>>  >
>>  > >  In any case, it does seem a pinch more efficient to only do one read
>>  > >  of hashCode ... switch up the code to be something like this.
>>  >
>>  > Agreed.
>>  >
>>  > >  public int hashCode() {
>>  > >     final int hash = hashCode;
>>  > >     if (hash == 0) {
>>  > >
>>  > >         if (count == 0) {
>>  > >             return 0;
>>  > >         }
>>  > >
>>  > >         for (int i = offset; i < count + offset; i++) {
>>  > >             hash = value[i] + ((hash << 5) - hash);
>>  > >         }
>>  > >         hashCode = hash;
>>  > >     }
>>  > >
>>  > >     return hash;
>>  > >  }
>>  > >
>>  > >  Thoughts?
>>  > >
>>  > >
>>  > >  >> where 'this.*' is always a memory reference while 'hash' is a thread
>>  > private var.
>>  > >  >>
>>  > >  >> 2. DRLVM JIT indeed does not privatize field access to threads. It
>>  > >  >>    always loads fields from memory in original order. Hence this
>>  > >  >>    potential bug does not affect DRLVM .. now. But potentially it 
>> can
>>  > >  >>    start optimizing things this way because current behavior 
>> prevents
>>  > >  >>    a bunch of high level optimizations.
>>  > >  >>
>>  > >  >> 3. Jeremy Manson, being an expert in Java Memory Model claims [1]
>>  > that
>>  > >  >>    such reordering is allowed theoretically. I.e. like this:
>>  > >  >>
>>  > >  >> int hash = this.hashCode;
>>  > >  >> if (this.hashCode == 0) {
>>  > >  >>   hash = this.hashCode = // calculate hash
>>  > >  >> }
>>  > >  >> return hash;
>>  > >  >>
>>  > >  >> This is a correct single-threaded code. What happened here is a
>>  > >  >> lengthy thread privatization of this.hashCode (again, does not 
>> happen
>>  > >  >> in DRLVM). That is incorrect in multithreaded environment and needs
>>  > >  >> extra synchronization/volatile/etc.
>>  > >  >>
>>  > >  >> 4. I do not know why a JIT would want to do that, i.e. two 
>> sequential
>>  > >  >>    reads from the same memory location. Sounds like a bit synthetic
>>  > example.
>>  > >  >
>>  > >  > ...at which point a bunch of code probably would go wrong!  So
>>  > hopefully
>>  > >  > it remains only a theoretical possibility.
>>  > >  >
>>  > >  > Regards,
>>  > >  > Tim
>>  > >  >
>>  > >  >> [1]
>>  > http://jeremymanson.blogspot.com/2008/12/benign-data-races-in-java.html
>>  > >  >>
>>  > >  >
>>  > >
>>  >
>>
>

Re: [jira] Created: (HARMONY-6404) possible data-reordering in some hashCode-methods (e.g. String or URL)

Reply via email to