Hi Galder, Thanks for doing the fix and the work verifying.
I also verified using a fence fixes the jcstress test. Similarly, I found I could only reproduce the issue in HotSpot when running a test more like ClassValue (the double checked locking pattern when publishing to a plan field) with the options -XX:+StressLCM -XX:+StressGCM. I analyzed assembler output (-XX:+PrintOptoAssembly) to observe the publish store occurring before some stores to fields. It’s highly likely that with HotSpot we got lucky in this case :-) I don’t object to the use of a release fence (LoadStore|StoreStore), but I believe we could also use a StoreStore fence in this case. Perhaps the ARM folks have a stronger opinion on this? I marginally prefer placing the fence in the initializeMap, since the relationship between construction and publish is very clear. In either case I recommend adding a comment on why the fence is required e.g.: // Place a Store fence as the last operation of the constructor to emulate // ClassValueMap containing final fields. This ensures it can be // published safely in the non-volatile field Class.classValueMap, (see initializeMap) // since stores to the fields of ClassValueMap will not be reordered // to occur after the store to the field type.classValueMap I also noticed we can change the field Class.classValueMap to be @Stable, but I think we should follow up on that investigation separately. Relatedly, I had an idea to modify HotSpot so it places a StoreStore fence directly before the store to a @Stable field. Paul. > On Aug 19, 2020, at 4:53 AM, Galder Zamarreno <gal...@redhat.com> wrote: > > Hi all, > > I've created patch [1] to fix the ClassValue$ClassValueMap NPE bug in [2]. > > The bug has been discussed by other members in the list in thread [3]. > > The patch follows the simple fix suggested by Doug and others in that > exchange, e.g. [4]. That is, it adds a release fence > to ClassValue$ClassValueMap constructor to avoid the NPE. > > To verify the fix, I ran the jcstress test that Paul posted in [5] and > played around with the difference fixes suggested in the thread. Adding the > release fence did indeed fix the jcstress test. > > To further verify the issue, I've successfully run both the tier1 tests and > the Quarkus native testsuite with a Mandrel 20.1 built with JDK 11.0.8 > version patched with the fix (higher JDKs not supported yet). Note that > this NPE happens on rare occasions. > > The patch applies cleanly to JDK 11. > > Galder > > [1] Webrev: > http://cr.openjdk.java.net/~sgehwolf/webrevs/galder/JDK-8251397/01/webrev/ > [2] Bug: https://bugs.openjdk.java.net/browse/JDK-8251397 > [3] > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068086.html > [4] > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068126.html > [5] > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-August/068110.html