Re: RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues
Hi Chris, Thanks for looking into this... On 07/25/2014 02:53 PM, Chris Hegarty wrote: Hi Peter, This is certainly a thorny issue, and I agree with the approach of using StampedLock. Some small comments / questions: 1) Why abuse fieldsLock in hostAddress(), rather than grabbing the writeLock ? URL.hostAddress() is called-back from URLStreamHandler when URL is delegating work for URL.equals, URL.hashCode and URL.sameFile methods to it. These methods hold readLock (URL.hashCode can also hold writeLock) while calling URLStreamHandler to do the work. StampedLock is not reentrant. Instead of abusing fieldsLock object as a monitor lock for caching hostAddress(), a CAS can be used to set hostAddress and still provide a stable value (in presence of dynamic name service mapping changes). 2) Does the setAccessible in readObject need to be in a doPriv? Also should this throw an InternalError/AssertionError if it fails? You're absolutely right! It needs to be in doPriv. Here's new webrev that wraps setAccessible in doPriv and uses CAS instead of mutex for hostAddress caching. http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.02/ I'll pop you patch into our internal build and test system. Great. It would also be interesting to see if there are any performance differences. Do you have any performance tests at Oracle that include URL comparison? Differences might most obviously show-up when this patch is combined with InetAddress caching enhancements: http://cr.openjdk.java.net/~plevart/jdk9-dev/InetAddress.Cache/webrev.03/ Regards, Peter -Chris. On 23/07/14 09:29, Peter Levart wrote: I created an issue for this: https://bugs.openjdk.java.net/browse/JDK-8051713 The proposed patch is still the following: http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/ Regards, Peter On 07/11/2014 05:11 PM, Peter Levart wrote: Hi, java.net.URL is supposed to behave as an immutable object, so URL instances can be shared among threads and among parts of code without fear that they will be modified. URL class has an unusual way to achieve this (or at least it tries to). Partly because of the design which uses: - URL constructor(s) that take a 'spec' String to be parsed into URL object - parsing is delegated to various URLStreamHandler(s) which are chosen in the URL constructor (depending on the protocol used in URL string to be parsed) An unitialized URL instance (this) is passed from constructor to the chosen URLStreamHandler which has the responsibility to parse the string and set back the fields that hold various parts of URL object being constructed. Consequently, these fields can not be declared as final, as definite assignment analysis doesn't cross method borders. It is therefore illegal to unsafely publish URL instances (via data races) to non-constructing threads because they can appear not fully initialized. Nevertheless URL, with the help of various URLStreamHandler implementations, tries hard to make URL appear stable at least where it is required to be stable. For example: URL.hashCode() is (almost) stable even if URL instance is unsafely published. This is achieved by making hashCode() synchronized and cache the result. At least one way of constructing URLs - constructors that take 'spec' String to be parsed - is also making sure that hashCode is computed from fully initialized fields, as parsing is delegated to URLStreamHandler which uses package-private URL.set() method to set back the parsed fields and the set() method is also synchronized. But making URL appear stable even though it is published unsafely doesn't seem to be the primary concern of URL synchronization. Other public URL constructors that take individual URL parts and don't delegate parsing to URLStreamHandler but set fields directly (not via set() method), are not synchronized. Primary concern of synchronization in URL appears to be driven from the fact that some URL operations like hasCode(), equals(), sameFile() and openConnection() read multiple URL fields and URL.set() which can be called from custom URLStreamHandler at any time (although this is not it's purpose - it should only call-back while parsing/constructing the URL) can set those fields. And those multi-field operations would like to see a "snapshot" of field values that is consistent. But synchronization that is performed to achieve that is questionable. Might be that in Java 1.0 times the JVM implementation assumptions were different and synchronization was correct, but nowadays Java memory model makes them invalid. URL.hasCode() apears to be the only method properly synchronized which makes it almost stable (doesn't return different results over time) but even hashCode() has a subtle bug or two. The initializer for hashCode field sets it to value -1 which represents "not yet computed" state. If URL is published unsafely, hashCode() method could see the "default" value of 0, wh
Re: RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues
Hi Peter, This is certainly a thorny issue, and I agree with the approach of using StampedLock. Some small comments / questions: 1) Why abuse fieldsLock in hostAddress(), rather than grabbing the writeLock ? 2) Does the setAccessible in readObject need to be in a doPriv? Also should this throw an InternalError/AssertionError if it fails? I'll pop you patch into our internal build and test system. -Chris. On 23/07/14 09:29, Peter Levart wrote: I created an issue for this: https://bugs.openjdk.java.net/browse/JDK-8051713 The proposed patch is still the following: http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/ Regards, Peter On 07/11/2014 05:11 PM, Peter Levart wrote: Hi, java.net.URL is supposed to behave as an immutable object, so URL instances can be shared among threads and among parts of code without fear that they will be modified. URL class has an unusual way to achieve this (or at least it tries to). Partly because of the design which uses: - URL constructor(s) that take a 'spec' String to be parsed into URL object - parsing is delegated to various URLStreamHandler(s) which are chosen in the URL constructor (depending on the protocol used in URL string to be parsed) An unitialized URL instance (this) is passed from constructor to the chosen URLStreamHandler which has the responsibility to parse the string and set back the fields that hold various parts of URL object being constructed. Consequently, these fields can not be declared as final, as definite assignment analysis doesn't cross method borders. It is therefore illegal to unsafely publish URL instances (via data races) to non-constructing threads because they can appear not fully initialized. Nevertheless URL, with the help of various URLStreamHandler implementations, tries hard to make URL appear stable at least where it is required to be stable. For example: URL.hashCode() is (almost) stable even if URL instance is unsafely published. This is achieved by making hashCode() synchronized and cache the result. At least one way of constructing URLs - constructors that take 'spec' String to be parsed - is also making sure that hashCode is computed from fully initialized fields, as parsing is delegated to URLStreamHandler which uses package-private URL.set() method to set back the parsed fields and the set() method is also synchronized. But making URL appear stable even though it is published unsafely doesn't seem to be the primary concern of URL synchronization. Other public URL constructors that take individual URL parts and don't delegate parsing to URLStreamHandler but set fields directly (not via set() method), are not synchronized. Primary concern of synchronization in URL appears to be driven from the fact that some URL operations like hasCode(), equals(), sameFile() and openConnection() read multiple URL fields and URL.set() which can be called from custom URLStreamHandler at any time (although this is not it's purpose - it should only call-back while parsing/constructing the URL) can set those fields. And those multi-field operations would like to see a "snapshot" of field values that is consistent. But synchronization that is performed to achieve that is questionable. Might be that in Java 1.0 times the JVM implementation assumptions were different and synchronization was correct, but nowadays Java memory model makes them invalid. URL.hasCode() apears to be the only method properly synchronized which makes it almost stable (doesn't return different results over time) but even hashCode() has a subtle bug or two. The initializer for hashCode field sets it to value -1 which represents "not yet computed" state. If URL is published unsafely, hashCode() method could see the "default" value of 0, which would be returned. A later call to hashCode() would see value -1 which would trigger computation and a different value would be returned. The other subtle bug is a relatively improbable event that hashCode computation results in value -1 which means "not yet computed". This can be seen as performance glitch (as hashCode will never be cached for such URL instance) or an issue which makes hashCode unstable for one of the reasons why equals() is unstable too (see below). If URL.hasCode() method is almost stable (doesn't return different results over time) and at least one way of URL construction makes sure that hashCode is also calculated from fully initialized parts, then URL.equals() and other methods that delegate to URLStreamHandler are a special story. URL.equals() can't be synchronized on URL instance, because it would have to be synchronized on both URL instances that are being compared and this is prone do dead-locks. Imagine: thread1: url1.equals(url2) thread2: url2.equals(url1) So equals() chooses to not be synchronized and therefore risks not being stable if URL instances are published unsafely. But it nevertheless uses synchronization. equals() delegates it's work to the 1st
RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues
I created an issue for this: https://bugs.openjdk.java.net/browse/JDK-8051713 The proposed patch is still the following: http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/ Regards, Peter On 07/11/2014 05:11 PM, Peter Levart wrote: Hi, java.net.URL is supposed to behave as an immutable object, so URL instances can be shared among threads and among parts of code without fear that they will be modified. URL class has an unusual way to achieve this (or at least it tries to). Partly because of the design which uses: - URL constructor(s) that take a 'spec' String to be parsed into URL object - parsing is delegated to various URLStreamHandler(s) which are chosen in the URL constructor (depending on the protocol used in URL string to be parsed) An unitialized URL instance (this) is passed from constructor to the chosen URLStreamHandler which has the responsibility to parse the string and set back the fields that hold various parts of URL object being constructed. Consequently, these fields can not be declared as final, as definite assignment analysis doesn't cross method borders. It is therefore illegal to unsafely publish URL instances (via data races) to non-constructing threads because they can appear not fully initialized. Nevertheless URL, with the help of various URLStreamHandler implementations, tries hard to make URL appear stable at least where it is required to be stable. For example: URL.hashCode() is (almost) stable even if URL instance is unsafely published. This is achieved by making hashCode() synchronized and cache the result. At least one way of constructing URLs - constructors that take 'spec' String to be parsed - is also making sure that hashCode is computed from fully initialized fields, as parsing is delegated to URLStreamHandler which uses package-private URL.set() method to set back the parsed fields and the set() method is also synchronized. But making URL appear stable even though it is published unsafely doesn't seem to be the primary concern of URL synchronization. Other public URL constructors that take individual URL parts and don't delegate parsing to URLStreamHandler but set fields directly (not via set() method), are not synchronized. Primary concern of synchronization in URL appears to be driven from the fact that some URL operations like hasCode(), equals(), sameFile() and openConnection() read multiple URL fields and URL.set() which can be called from custom URLStreamHandler at any time (although this is not it's purpose - it should only call-back while parsing/constructing the URL) can set those fields. And those multi-field operations would like to see a "snapshot" of field values that is consistent. But synchronization that is performed to achieve that is questionable. Might be that in Java 1.0 times the JVM implementation assumptions were different and synchronization was correct, but nowadays Java memory model makes them invalid. URL.hasCode() apears to be the only method properly synchronized which makes it almost stable (doesn't return different results over time) but even hashCode() has a subtle bug or two. The initializer for hashCode field sets it to value -1 which represents "not yet computed" state. If URL is published unsafely, hashCode() method could see the "default" value of 0, which would be returned. A later call to hashCode() would see value -1 which would trigger computation and a different value would be returned. The other subtle bug is a relatively improbable event that hashCode computation results in value -1 which means "not yet computed". This can be seen as performance glitch (as hashCode will never be cached for such URL instance) or an issue which makes hashCode unstable for one of the reasons why equals() is unstable too (see below). If URL.hasCode() method is almost stable (doesn't return different results over time) and at least one way of URL construction makes sure that hashCode is also calculated from fully initialized parts, then URL.equals() and other methods that delegate to URLStreamHandler are a special story. URL.equals() can't be synchronized on URL instance, because it would have to be synchronized on both URL instances that are being compared and this is prone do dead-locks. Imagine: thread1: url1.equals(url2) thread2: url2.equals(url1) So equals() chooses to not be synchronized and therefore risks not being stable if URL instances are published unsafely. But it nevertheless uses synchronization. equals() delegates it's work to the 1st URL's URLStreamHandler which synchronizes on itself when calculating and caching the InetAddress of each individual URL's host name. InetAddress (if resolvable) is used in preference to host name for comparison (and also in hashCode()). URL.equals() risks not being stable for the following reasons: - URL instances published unsafely can appear not fully initialized to equals() even though
URL constructor/equals/hashCode/sameFile/openConnection synchronization issues
Hi, java.net.URL is supposed to behave as an immutable object, so URL instances can be shared among threads and among parts of code without fear that they will be modified. URL class has an unusual way to achieve this (or at least it tries to). Partly because of the design which uses: - URL constructor(s) that take a 'spec' String to be parsed into URL object - parsing is delegated to various URLStreamHandler(s) which are chosen in the URL constructor (depending on the protocol used in URL string to be parsed) An unitialized URL instance (this) is passed from constructor to the chosen URLStreamHandler which has the responsibility to parse the string and set back the fields that hold various parts of URL object being constructed. Consequently, these fields can not be declared as final, as definite assignment analysis doesn't cross method borders. It is therefore illegal to unsafely publish URL instances (via data races) to non-constructing threads because they can appear not fully initialized. Nevertheless URL, with the help of various URLStreamHandler implementations, tries hard to make URL appear stable at least where it is required to be stable. For example: URL.hashCode() is (almost) stable even if URL instance is unsafely published. This is achieved by making hashCode() synchronized and cache the result. At least one way of constructing URLs - constructors that take 'spec' String to be parsed - is also making sure that hashCode is computed from fully initialized fields, as parsing is delegated to URLStreamHandler which uses package-private URL.set() method to set back the parsed fields and the set() method is also synchronized. But making URL appear stable even though it is published unsafely doesn't seem to be the primary concern of URL synchronization. Other public URL constructors that take individual URL parts and don't delegate parsing to URLStreamHandler but set fields directly (not via set() method), are not synchronized. Primary concern of synchronization in URL appears to be driven from the fact that some URL operations like hasCode(), equals(), sameFile() and openConnection() read multiple URL fields and URL.set() which can be called from custom URLStreamHandler at any time (although this is not it's purpose - it should only call-back while parsing/constructing the URL) can set those fields. And those multi-field operations would like to see a "snapshot" of field values that is consistent. But synchronization that is performed to achieve that is questionable. Might be that in Java 1.0 times the JVM implementation assumptions were different and synchronization was correct, but nowadays Java memory model makes them invalid. URL.hasCode() apears to be the only method properly synchronized which makes it almost stable (doesn't return different results over time) but even hashCode() has a subtle bug or two. The initializer for hashCode field sets it to value -1 which represents "not yet computed" state. If URL is published unsafely, hashCode() method could see the "default" value of 0, which would be returned. A later call to hashCode() would see value -1 which would trigger computation and a different value would be returned. The other subtle bug is a relatively improbable event that hashCode computation results in value -1 which means "not yet computed". This can be seen as performance glitch (as hashCode will never be cached for such URL instance) or an issue which makes hashCode unstable for one of the reasons why equals() is unstable too (see below). If URL.hasCode() method is almost stable (doesn't return different results over time) and at least one way of URL construction makes sure that hashCode is also calculated from fully initialized parts, then URL.equals() and other methods that delegate to URLStreamHandler are a special story. URL.equals() can't be synchronized on URL instance, because it would have to be synchronized on both URL instances that are being compared and this is prone do dead-locks. Imagine: thread1: url1.equals(url2) thread2: url2.equals(url1) So equals() chooses to not be synchronized and therefore risks not being stable if URL instances are published unsafely. But it nevertheless uses synchronization. equals() delegates it's work to the 1st URL's URLStreamHandler which synchronizes on itself when calculating and caching the InetAddress of each individual URL's host name. InetAddress (if resolvable) is used in preference to host name for comparison (and also in hashCode()). URL.equals() risks not being stable for the following reasons: - URL instances published unsafely can appear not fully initialized to equals() even though they were constructed with constructors that delegate parsing to URLStreamHandler(s) which use synchronized URL.set() to set the fields, because URL.equals() is not synchronized. - URL.hostAddress that is calculated on demand and then cached on the URL instance shou