Re: RFR: 8175010: ImageReader is not thread-safe
Hi David, On 2017-02-16 09:08, David Holmes wrote: Hi Claes, On 15/02/2017 11:22 PM, Claes Redestad wrote: Hi, a few intermittent but rare test failures[1] that has appeared since the latest jake integration, and since one of the changes in there was to make initialization of the system ImageReader lazy there appears to be cases where ImageReaders are not safely published: I find it very hard to discern exactly how these classes are intended to be used concurrently. Is there some lifecycle description of the ImageReader and how it is intended to be used? Without any synchronization I still see lots of not-thread-safe code. Simple example: 75 public void close() throws IOException { 76 if (closed) { 77 throw new IOException("image file already closed"); 78 } 79 reader.close(this); 80 closed = true; 81 } two threads can still both call reader.close(this) concurrently. And if the new closed volatile boolean solves something then wouldn't making the reader variable volatile achieve the same JMM affects? reader.close(this) calls will synchronize on SharedImageReader.OPEN_FILES, so concurrent calls on the same ImageReader should be OK. We should add a comment to point this out. /Claes David - - Ensure ImageReader::open is called only once per Path in ImageReaderFactory by using CHM.computeIfAbsent - Ensure ImageReader.reader is safely published to a final field and signal close using a volatile boolean instead webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ bug: https://bugs.openjdk.java.net/browse/JDK-8175010 Testing shows no issues (which admittedly doesn't mean we're actually solving the root cause for JDK-8174817), and performance numbers from adding a volatile read indicate that any overhead is lost in the noise on ImageReader-heavy workloads. Thanks! /Claes [1] https://bugs.openjdk.java.net/browse/JDK-8174817
Re: RFR: 8175010: ImageReader is not thread-safe
On 2017-02-16 09:17, Peter Levart wrote: I think your observations about potential issues in JRTFS is correct, but there was nothing to suggest JRTFS code was involved in JDK-8174817 (as it's not code that's used by the BuiltinClassLoader). The only remaining ImageReader.close() invocation is now in jdk.internal.jrtfs.SystemImage anonymous inner class delegated from SystemImage.close(), which is only called from JrtFileSystem.close() or .finalize(). So you think JRTFS is not the source of the problem? What if JRTFS is accessing the same image as JavaRuntimeURLConnection in the same VM. Different instances of ImageReader would be used for them, but they would share the same underlying SharedImageReader. Currently I don't see a problem in SharedImageReader code, but I haven't studied it carefully yet. Maybe there is something to be found there... Nothing is impossible, there was simply no breadcrumbs to suggest this was happening. One thing I think we should do to rule this out is to make sure that the SharedImageReader representing the system image simply can't be closed, which would really be an error. /Claes
Re: RFR: 8175010: ImageReader is not thread-safe
Hi Claes, On 02/15/2017 10:41 PM, Claes Redestad wrote: Hi Peter, are you suggesting that if I have class Foo { Bar b; }, then creating and putting Foo b in a CHM before returning it to a consumer which is then read from another thread is enough to force b to be safely published even when the other thread does *not* get the object via a call to the same CHM (which would go via the same volatile read and add the necessary happens before relationship)? I recalled having seen examples to the effect that a non-volatile b isn't safely published in this case. No, I'm not suggesting that. I'm merely observing (and hoping that my IDE didn't miss places in code where this is not the case) that there is no unsafe publication going on here. Obtaining ImageReader from ImageReaderFactory.get() is safe as it either returns an instance constructed in the same thread or it returns an instance constructed in a different thread, but in that case, the instance is handed through CHM.set/get which enforces ordering. The only other place where I found ImageReader.open() call is in constructing the SystemImage implementation as an anonymous inner instance which holds an ImageReader in its synthetic final field which ensures safe publication of ImageReader even if SystemImage instance is published unsafely. The (very shaky) hypothesis is thus that this could be what's happening in any of the places which load and locally cache the system ImageReader for anyone to use, e.g., SystemModuleFinder.SystemImage ...here the ImageReader instance is obtained via ImageReaderFactory.getImageReader(): /** * Holder class for the ImageReader */ private static class SystemImage { static final ImageReader READER; static { long t0 = System.nanoTime(); READER = ImageReaderFactory.getImageReader(); initTime.addElapsedTimeFrom(t0); } static ImageReader reader() { return READER; } } ...which is just a shortcut for ImageReaderFactory.get(BOOT_MODULES_JIMAGE) and I believe ImageReaderFactory.get() publishes (and did publish even before the patch) ImageReader instances safely (old code): 52 private static final Mapreaders = new ConcurrentHashMap<>(); 53 54 /** 55 * Returns an {@code ImageReader} to read from the given image file 56 */ 57 public static ImageReader get(Path jimage) throws IOException { 58 Objects.requireNonNull(jimage); 59 ImageReader reader = readers.get(jimage); 60 if (reader != null) { 61 return reader; // <<-- HERE the instance was handed through CHM 62 } 63 reader = ImageReader.open(jimage); 64 // potential race with other threads opening the same URL 65 ImageReader r = readers.putIfAbsent(jimage, reader); 66 if (r == null) { 67 return reader; // <<-- HERE the instance was constructed in same thread 68 } else { 69 reader.close(); 70 return r; // <<-- HERE the instance was handed through CHM 71 } 72 } or JavaRuntimeURLConnection (which is implicitly called when there's a security manager installed). ...here the instance is also obtained via ImageReaderFactory.getImageReader(): // ImageReader to access resources in jimage private static final ImageReader reader; static { PrivilegedAction pa = ImageReaderFactory::getImageReader; reader = AccessController.doPrivileged(pa); } In both above places the ImageReader instance is obtained in and this is another "layer" of synchronization between consumer and producer threads. I might be (in fact likely am) wrong, but we discussed this offline and came to the conclusion that there was no harm in implementing these improvements regardless of whether they actually resolve 8174817 or not. No, there's no harm and I support the changes. It's just that they might not help in resolving the problem. I think prior to this patch a concurrent ImageReader.close() could happen if there was a race between 3 or more threads to resolve the same Path from ImageReaderFactory.get (especially since there might be a longish time window there since we might block to load a library etc), so I don't think we lose anything from plugging that hole by using computeIfAbsent here. I don't believe this was the the source of concurrent close(). If you look at the code (above), you can see that reader.close() (line 69) is called only if the reader was not successfully published and it is called by the thread that constructed it. But computeIfAbsent is a good choice here to avoid optimistic constructions followed by close(s) when there is concurrency. I think your observations about potential issues in JRTFS is correct, but there was nothing to suggest JRTFS code was involved in JDK-8174817 (as it's
Re: RFR: 8175010: ImageReader is not thread-safe
Hi Peter, are you suggesting that if I have class Foo { Bar b; }, then creating and putting Foo b in a CHM before returning it to a consumer which is then read from another thread is enough to force b to be safely published even when the other thread does *not* get the object via a call to the same CHM (which would go via the same volatile read and add the necessary happens before relationship)? I recalled having seen examples to the effect that a non-volatile b isn't safely published in this case. The (very shaky) hypothesis is thus that this could be what's happening in any of the places which load and locally cache the system ImageReader for anyone to use, e.g., SystemModuleFinder.SystemImage or JavaRuntimeURLConnection (which is implicitly called when there's a security manager installed). I might be (in fact likely am) wrong, but we discussed this offline and came to the conclusion that there was no harm in implementing these improvements regardless of whether they actually resolve 8174817 or not. I think prior to this patch a concurrent ImageReader.close() could happen if there was a race between 3 or more threads to resolve the same Path from ImageReaderFactory.get (especially since there might be a longish time window there since we might block to load a library etc), so I don't think we lose anything from plugging that hole by using computeIfAbsent here. I think your observations about potential issues in JRTFS is correct, but there was nothing to suggest JRTFS code was involved in JDK-8174817 (as it's not code that's used by the BuiltinClassLoader). Thanks! /Claes On 2017-02-15 21:52, Peter Levart wrote: Hi Claes, Reading the https://bugs.openjdk.java.net/browse/JDK-8174817 and then the change that was just pushed, I can't seem to figure out what was the problem with original code. I can't find evidence for claims in https://bugs.openjdk.java.net/browse/JDK-8175010 . Is the problem publication of ImageReader via ImageReaderFactory? That can't be it, since ImageReaderFactory is using ConcurrentHashMap which ensures happens before relationships. Is there any place else where ImageReader.open(Path) is called and then the instance unsafely published to other threads? The only place I could find is in jdk.internal.jrtfs.SystemImage.open(): static SystemImage open() throws IOException { if (modulesImageExists) { // open a .jimage and build directory structure final ImageReader image = ImageReader.open(moduleImageFile); image.getRootDirectory(); return new SystemImage() { @Override Node findNode(String path) throws IOException { return image.findNode(path); } @Override byte[] getResource(Node node) throws IOException { return image.getResource(node); } @Override void close() throws IOException { image.close(); } }; } ...here the final 'image' local variable is captured by anonymous inner SystemImage subclass into a synthetic final field, so this final field ensures that ImageReader is published safely as a delegate of SystemImage. ImageReader.open(Path) factory method delegates to ImageReader.SharedImageReader.open(Path, ByteOrder) which creates a new instance of ImageReader encapsulating a SharedImageReader object: public static ImageReader open(Path imagePath, ByteOrder byteOrder) throws IOException { Objects.requireNonNull(imagePath); Objects.requireNonNull(byteOrder); synchronized (OPEN_FILES) { SharedImageReader reader = OPEN_FILES.get(imagePath); if (reader == null) { // Will fail with an IOException if wrong byteOrder. reader = new SharedImageReader(imagePath, byteOrder); OPEN_FILES.put(imagePath, reader); } else if (reader.getByteOrder() != byteOrder) { throw new IOException("\"" + reader.getName() + "\" is not an image file"); } ImageReader image = new ImageReader(reader); // <<-- HERE reader.openers.add(image); return image; } } ...the ImageReader returned from this method is not safe to publish via data race, but as far as I can see, there is no such publishing going on. So am I right in that all this patch solves is it eliminates a possibility of NPE when ImageReader is close()-d concurrently with being used from some other thread. If such NPE was observed, it means that close() is being called concurrently with access and there are still races possible which can cause undesired effects. For example: calling ImageReader.close() while using it concurrently may close underlying SharedImageReader and then after closing, access it. So is
Re: RFR: 8175010: ImageReader is not thread-safe
Hi Claes, Reading the https://bugs.openjdk.java.net/browse/JDK-8174817 and then the change that was just pushed, I can't seem to figure out what was the problem with original code. I can't find evidence for claims in https://bugs.openjdk.java.net/browse/JDK-8175010 . Is the problem publication of ImageReader via ImageReaderFactory? That can't be it, since ImageReaderFactory is using ConcurrentHashMap which ensures happens before relationships. Is there any place else where ImageReader.open(Path) is called and then the instance unsafely published to other threads? The only place I could find is in jdk.internal.jrtfs.SystemImage.open(): static SystemImage open() throws IOException { if (modulesImageExists) { // open a .jimage and build directory structure final ImageReader image = ImageReader.open(moduleImageFile); image.getRootDirectory(); return new SystemImage() { @Override Node findNode(String path) throws IOException { return image.findNode(path); } @Override byte[] getResource(Node node) throws IOException { return image.getResource(node); } @Override void close() throws IOException { image.close(); } }; } ...here the final 'image' local variable is captured by anonymous inner SystemImage subclass into a synthetic final field, so this final field ensures that ImageReader is published safely as a delegate of SystemImage. ImageReader.open(Path) factory method delegates to ImageReader.SharedImageReader.open(Path, ByteOrder) which creates a new instance of ImageReader encapsulating a SharedImageReader object: public static ImageReader open(Path imagePath, ByteOrder byteOrder) throws IOException { Objects.requireNonNull(imagePath); Objects.requireNonNull(byteOrder); synchronized (OPEN_FILES) { SharedImageReader reader = OPEN_FILES.get(imagePath); if (reader == null) { // Will fail with an IOException if wrong byteOrder. reader = new SharedImageReader(imagePath, byteOrder); OPEN_FILES.put(imagePath, reader); } else if (reader.getByteOrder() != byteOrder) { throw new IOException("\"" + reader.getName() + "\" is not an image file"); } ImageReader image = new ImageReader(reader); // <<-- HERE reader.openers.add(image); return image; } } ...the ImageReader returned from this method is not safe to publish via data race, but as far as I can see, there is no such publishing going on. So am I right in that all this patch solves is it eliminates a possibility of NPE when ImageReader is close()-d concurrently with being used from some other thread. If such NPE was observed, it means that close() is being called concurrently with access and there are still races possible which can cause undesired effects. For example: calling ImageReader.close() while using it concurrently may close underlying SharedImageReader and then after closing, access it. So is there a concurrent ImageReader.close() possible? The only place I see ImageReader.close() being invoked is from SystemImage.close() in the anonymous inner class implementation which wraps ImageReader. SystemImage.close() is only being invoked from JrtFileSystem.cleanup(), which is called from JrtFileSystem.close() and JrtFileSystem.finalize(). The following is theoretically possible: FileSystem fs = FileSystems.newFileSystem(URI.create("jrt:/"), ...); Path p = fs.getPath(...); FileSystemProvider provider = fs.provider(); InputStream is = provider.newInputStream(p, ...); // 'fs' and 'p' (which has a reference to 'fs') may be found finalizable and be finalized while the call to obtain content of input stream is in progress For this to be prevented, the following method in JrtFileSystem: // returns the content of the file resource specified by the path byte[] getFileContent(JrtPath path) throws IOException { Node node = checkNode(path); if (node.isDirectory()) { throw new FileSystemException(path + " is a directory"); } //assert node.isResource() : "resource node expected here"; return image.getResource(node); } ...would have to be changed to: byte[] getFileContent(JrtPath path) throws IOException { Node node = checkNode(path); if (node.isDirectory()) { throw new FileSystemException(path + " is a directory"); } //assert node.isResource() : "resource node expected here"; byte[] content = image.getResource(node); Reference.reachabilityFence(this); return
Re: RFR: 8175010: ImageReader is not thread-safe
> On Feb 15, 2017, at 5:22 AM, Claes Redestadwrote: > > Hi, > > a few intermittent but rare test failures[1] that has appeared > since the latest jake integration, and since one of the changes > in there was to make initialization of the system ImageReader > lazy there appears to be cases where ImageReaders are not > safely published: > > - Ensure ImageReader::open is called only once per Path in > ImageReaderFactory by using CHM.computeIfAbsent > - Ensure ImageReader.reader is safely published to a > final field and signal close using a volatile boolean instead > > webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ Looks good. Mandy
Re: RFR: 8175010: ImageReader is not thread-safe
Jim, Chris, Alan, thanks for reviewing! On 02/15/2017 02:56 PM, Alan Bateman wrote: On 15/02/2017 13:22, Claes Redestad wrote: Hi, a few intermittent but rare test failures[1] that has appeared since the latest jake integration, and since one of the changes in there was to make initialization of the system ImageReader lazy there appears to be cases where ImageReaders are not safely published: - Ensure ImageReader::open is called only once per Path in ImageReaderFactory by using CHM.computeIfAbsent - Ensure ImageReader.reader is safely published to a final field and signal close using a volatile boolean instead webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ bug: https://bugs.openjdk.java.net/browse/JDK-8175010 ImageReaderFactory looks good. The changes to ImageReader are okay too, always a bit odd that this code throw NPE when the reader was closed. There is still an issue with async close of course in that someone could close at the same time as an access. However that is a high-level issue for jrtfs, at run-time then the image file is opened once and is never closed. Yes, this patch doesn't attempt to improve on how races are handled when closing, but should help to avoid potential publication issues when opening. It seems it could be worthwhile to have a special implementation class for the system module, since we could make it perfectly shareable and non-closeable, thus avoid delegation and closed-checking overheads, but that'd be a larger endeavor, perhaps suitable as an RFE for 10. /Claes -Alan
Re: RFR: 8175010: ImageReader is not thread-safe
On 15/02/2017 13:22, Claes Redestad wrote: Hi, a few intermittent but rare test failures[1] that has appeared since the latest jake integration, and since one of the changes in there was to make initialization of the system ImageReader lazy there appears to be cases where ImageReaders are not safely published: - Ensure ImageReader::open is called only once per Path in ImageReaderFactory by using CHM.computeIfAbsent - Ensure ImageReader.reader is safely published to a final field and signal close using a volatile boolean instead webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ bug: https://bugs.openjdk.java.net/browse/JDK-8175010 ImageReaderFactory looks good. The changes to ImageReader are okay too, always a bit odd that this code throw NPE when the reader was closed. There is still an issue with async close of course in that someone could close at the same time as an access. However that is a high-level issue for jrtfs, at run-time then the image file is opened once and is never closed. -Alan
Re: RFR: 8175010: ImageReader is not thread-safe
> On 15 Feb 2017, at 13:22, Claes Redestadwrote: > > Hi, > > a few intermittent but rare test failures[1] that has appeared > since the latest jake integration, and since one of the changes > in there was to make initialization of the system ImageReader > lazy there appears to be cases where ImageReaders are not > safely published: > > - Ensure ImageReader::open is called only once per Path in > ImageReaderFactory by using CHM.computeIfAbsent > - Ensure ImageReader.reader is safely published to a > final field and signal close using a volatile boolean instead > > webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ Looks good Claes. -Chris. > bug: https://bugs.openjdk.java.net/browse/JDK-8175010 > > Testing shows no issues (which admittedly doesn't mean we're > actually solving the root cause for JDK-8174817), and performance > numbers from adding a volatile read indicate that any overhead > is lost in the noise on ImageReader-heavy workloads. > > Thanks! > > /Claes > > [1] https://bugs.openjdk.java.net/browse/JDK-8174817
Re: RFR: 8175010: ImageReader is not thread-safe
+1 > On Feb 15, 2017, at 9:22 AM, Claes Redestadwrote: > > Hi, > > a few intermittent but rare test failures[1] that has appeared > since the latest jake integration, and since one of the changes > in there was to make initialization of the system ImageReader > lazy there appears to be cases where ImageReaders are not > safely published: > > - Ensure ImageReader::open is called only once per Path in > ImageReaderFactory by using CHM.computeIfAbsent > - Ensure ImageReader.reader is safely published to a > final field and signal close using a volatile boolean instead > > webrev: http://cr.openjdk.java.net/~redestad/8175010/webrev.02/ > bug: https://bugs.openjdk.java.net/browse/JDK-8175010 > > Testing shows no issues (which admittedly doesn't mean we're > actually solving the root cause for JDK-8174817), and performance > numbers from adding a volatile read indicate that any overhead > is lost in the noise on ImageReader-heavy workloads. > > Thanks! > > /Claes > > [1] https://bugs.openjdk.java.net/browse/JDK-8174817