Hi Ogata,
Do you have a source for your benchmark to share? I might be able to
squeeze a little more from this approach but it would be easier if I
could test variants right away and not bother you...
Regards, Peter
On 09/20/17 10:14, Kazunori Ogata wrote:
Hi Peter,
The performance improvement was +2.9%. It is faster than the version that
uses an extra dereference (+2.2%).
Although it's slower than the variation of full fence, I think I
understand Hans's concern and I agree your fix is the right answer.
@Hans,
I thought DATA_LAYOUT_GUESS in your example is fetched from memory at
somewhere and arbitrary time, but I now understand the meaning of
"prefetch dataLayout" is to calculate the value of dataLayout without
accessing memory. I'm not sure how to calculate it, but I noticed that
even piking a random value can have a non-zero possibility of passing the
check at line 1204.5.
I agree that loading slot[17] can happen before executing full fence if
the value of dataLayout does not come from memory and there is no data
dependence between writing to dataLayout and reading from dataLayout. I
appreciate your comments.
Regards,
Ogata
From: Hans Boehm <[email protected]>
To: Kazunori Ogata <[email protected]>
Cc: Peter Levart <[email protected]>, core-libs-dev
<[email protected]>
Date: 2017/09/19 05:47
Subject: Re: RFR: 8187033: [PPC] Imporve performance of
ObjectStreamClass.getClassDataLayout()
On Mon, Sep 18, 2017 at 10:52 AM, Kazunori Ogata <[email protected]>
wrote:
Hi Peter,
Peter Levart <[email protected]> wrote on 2017/09/18 22:05:43:
On 09/18/2017 12:28 PM, Kazunori Ogata wrote:
Hi Hans and Peter,
Thank you for your comments.
Regarding the code Hans showed, I don't yet understand what it the
problem. Since the load at 1204b is a speculative one,
dereferencing
slots[17] should not raise any exception. If the confirmation at
1204.5
succeeds, the value of tmp must also be correct because we put full
fence
and we see a non-NULL reference that was stored after the full
fence.
I don't know much, but I can imagine that speculative read may see the
value and guess it correctly based on let's say some CPU state of
half-processed write instruction in the pipeline, which is established
even before the fence instruction flushes writes to array slots. So I
can accept that such outcome is possible and doesn't violate JMM.
This seems to me that the processor/platform can't implement full fence
correctly. I think it is the platform's (processor's and compiler's)
responsibility to support full fence, otherwise the platform can't
implement all Java API, including VarHandle.fullFence().
As Peter said, my concern is not with exceptions, but with seeing
uninitialized
data for slots[17].
The semantics of "full fences" are tricky, but basically they don't
restrict
reordering in other threads, only the thread that executed the fence. The
thread
with the problematic reordering here is the one that saw a non-null
dataLayout value, and hence did not execute a fence.
Hence fences generally have to be paired with either another fence in the
other
thread, or some other ordering mechanism. That other ordering mechanism is
missing here, though many implementations will ensure correct ordering,
due to
hardware dependence-based ordering guarantees. But the JMM does not
promise that.
Hans