Hi Hans and Peter, Thank you for your comments.
Regarding the code Hans showed, I don't yet understand what it the problem. Since the load at 1204b is a speculative one, dereferencing slots[17] should not raise any exception. If the confirmation at 1204.5 succeeds, the value of tmp must also be correct because we put full fence and we see a non-NULL reference that was stored after the full fence. Also note that even the original code doesn't assume that a ClassDataSlot array is singleton for a given class. So even if another method modifies dataLayout between 1204b and 1204.5, the current thread can keep using the reference loaded earlier. If a thread reads a non-NULL reference, the ClassDataSlot[] entries reachable through the reference must be correct because we put full fence. @Peter, Thank you for the fix. I'll measure the performance. Regards, Ogata Peter Levart <[email protected]> wrote on 2017/09/18 05:51:07: > From: Peter Levart <[email protected]> > To: Hans Boehm <[email protected]> > Cc: Kazunori Ogata <[email protected]>, core-libs-dev <core-libs- > [email protected]> > Date: 2017/09/18 05:51 > Subject: Re: RFR: 8187033: [PPC] Imporve performance of > ObjectStreamClass.getClassDataLayout() > > Hi, > On 09/15/17 19:54, Hans Boehm wrote: > The problem occurs if this is transformed (by hardware or compiler) to > > 1196 ClassDataSlot[] getClassDataLayout() throws InvalidClassException { > 1197 // REMIND: synchronize instead of relying on fullFence()? > <prefetch dataLayout> > 1198 ClassDataSlot[] slots = DATA_LAYOUT_GUESS; > 1199 if (slots == null) { > if (dataLayout != DATA_LAYOUT_GUESS) <recover> > 1200 slots = getClassDataLayout0(); > 1201 VarHandle.fullFence(); > 1202 dataLayout = slots; > 1203 } > 1204b tmp = slots[17]; > 1204.5 if (dataLayout != DATA_LAYOUT_GUESS) <recover> > 1205 ... > > (This is only an illustration. If the problem were to occur in real life, > it would probably occur as a > result of a different optimization. DEC Alpha allowed this sort of thing > for entirely different reasons.) > > Observe that > > (1) This transformation is allowed by the Java memory model, since > dataLayout is not a final field. > (2) This code breaks if another thread runs all of the initialization > code, including the code that sets > slots[17] and the code that sets dataLayout, between 1204b and 1204.5, but > the check in > 1204.5 still succeeds (because we guessed well). tmp will contain the pre- > initialization value of slots[17]. > > The fence is not executed by the reading thread, and has no impact on > ordering within the reading thread. > > C++ fences have no effect unless they are paired with another fence or > ordered atomic operation in > the other thread involved in the communication. I think that is the > current intent for Java as well. > > Well, in that case, it's better to stick with final fields... > @Ogata > > You said you implemented 4 variants: > On 09/04/17 07:20, Kazunori Ogata wrote: > 1) Put VarHandle.fullFence() between initialization of ClassDataSlot[] and > writing the reference to non-volatile dataLayout. > Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-fence/ > > 2) Use Unsafe.getObjectAcquire() and Unsafe.putObjectRelease() for > reading/writing dataLayout. > Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-unsafe/ > > 3) Put reference to ClassDataSlot[] into a final field of an object and > store the object to non-volatile dataLayout. Every invocation of > getDataLayout() reads the final field needs to deference the object > pointer. > Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final/ > > 4) Put reference to ClassDataSlot[] into a final field of an object, read > the final field immediately after the object creation, and store it to > non-volatile dataLayout. I think this is also correct based on the > semantics of final fields and data dependency. > Webrev: http://cr.openjdk.java.net/~horii/8187033/webrev.01-final2/ > > > The performance improvements were: > > 1) +3.5% > 2) +1.1% > 3) +2.2% > 4) +3.4% > > > The 1st and 4th are not correct as we have established. The 3rd is > promising, but does not have the most speed improvement. Perhaps because > of extra de-referencing. > > What if 'dataLayout' was not an array of ClassDataSlot records, each of > them containing a reference to an ObjectStreamClass and a boolean, but an > object containing two arrays: > > static class ClassDataLayout { > final ObjectStreamClass[] descs; > final boolean[] hasDatas; > } > > Such object could be "unsafely" published. By eliminating the intermediate > ClassDataSlot object, number of de-references should be kept down. > > Here's a prototype: > > http://cr.openjdk.java.net/~plevart/jdk10-dev/ > 8187033_ObjectStreamClass.dataLayout/webrev.01/ > > > Could you give it a try in your benchmark and compare it with your last > approach (with fullFence)? > > Regards, Peter
