openConnection synchronization issues

Peter Levart Wed, 23 Jul 2014 01:31:23 -0700

I created an issue for this:

    https://bugs.openjdk.java.net/browse/JDK-8051713


The proposed patch is still the following:

http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/

Regards, Peter


On 07/11/2014 05:11 PM, Peter Levart wrote:

Hi,
java.net.URL is supposed to behave as an immutable object, so URLinstances can be shared among threads and among parts of code withoutfear that they will be modified. URL class has an unusual way toachieve this (or at least it tries to). Partly because of the designwhich uses:
- URL constructor(s) that take a 'spec' String to be parsed into URLobject- parsing is delegated to various URLStreamHandler(s) which are chosenin the URL constructor (depending on the protocol used in URL stringto be parsed)
An unitialized URL instance (this) is passed from constructor to thechosen URLStreamHandler which has the responsibility to parse thestring and set back the fields that hold various parts of URL objectbeing constructed. Consequently, these fields can not be declared asfinal, as definite assignment analysis doesn't cross method borders.It is therefore illegal to unsafely publish URL instances (via dataraces) to non-constructing threads because they can appear not fullyinitialized. Nevertheless URL, with the help of variousURLStreamHandler implementations, tries hard to make URL appear stableat least where it is required to be stable. For example:URL.hashCode() is (almost) stable even if URL instance is unsafelypublished. This is achieved by making hashCode() synchronized andcache the result. At least one way of constructing URLs - constructorsthat take 'spec' String to be parsed - is also making sure thathashCode is computed from fully initialized fields, as parsing isdelegated to URLStreamHandler which uses package-private URL.set()method to set back the parsed fields and the set() method is alsosynchronized. But making URL appear stable even though it is publishedunsafely doesn't seem to be the primary concern of URLsynchronization. Other public URL constructors that take individualURL parts and don't delegate parsing to URLStreamHandler but setfields directly (not via set() method), are not synchronized.
Primary concern of synchronization in URL appears to be driven fromthe fact that some URL operations like hasCode(), equals(), sameFile()and openConnection() read multiple URL fields and URL.set() which canbe called from custom URLStreamHandler at any time (although this isnot it's purpose - it should only call-back while parsing/constructingthe URL) can set those fields. And those multi-field operations wouldlike to see a "snapshot" of field values that is consistent. Butsynchronization that is performed to achieve that is questionable.Might be that in Java 1.0 times the JVM implementation assumptionswere different and synchronization was correct, but nowadays Javamemory model makes them invalid.
URL.hasCode() apears to be the only method properly synchronized whichmakes it almost stable (doesn't return different results over time)but even hashCode() has a subtle bug or two. The initializer forhashCode field sets it to value -1 which represents "not yet computed"state. If URL is published unsafely, hashCode() method could see the"default" value of 0, which would be returned. A later call tohashCode() would see value -1 which would trigger computation and adifferent value would be returned. The other subtle bug is arelatively improbable event that hashCode computation results in value-1 which means "not yet computed". This can be seen as performanceglitch (as hashCode will never be cached for such URL instance) or anissue which makes hashCode unstable for one of the reasons whyequals() is unstable too (see below).
If URL.hasCode() method is almost stable (doesn't return differentresults over time) and at least one way of URL construction makes surethat hashCode is also calculated from fully initialized parts, thenURL.equals() and other methods that delegate to URLStreamHandler are aspecial story. URL.equals() can't be synchronized on URL instance,because it would have to be synchronized on both URL instances thatare being compared and this is prone do dead-locks. Imagine:
thread1: url1.equals(url2)
thread2: url2.equals(url1)
So equals() chooses to not be synchronized and therefore risks notbeing stable if URL instances are published unsafely. But itnevertheless uses synchronization. equals() delegates it's work to the1st URL's URLStreamHandler which synchronizes on itself whencalculating and caching the InetAddress of each individual URL's hostname. InetAddress (if resolvable) is used in preference to host namefor comparison (and also in hashCode()). URL.equals() risks not beingstable for the following reasons:
- URL instances published unsafely can appear not fully initialized toequals() even though they were constructed with constructors thatdelegate parsing to URLStreamHandler(s) which use synchronizedURL.set() to set the fields, because URL.equals() is not synchronized.
- URL.hostAddress that is calculated on demand and then cached on theURL instance should help make equals() stable in the presence ofdynamic changes to host name -> IP address mapping, but caching is notperformed for unsuccessful resolving. Temporary name service outagecan make URL.equals() unstable.
- URL.hostAddress caching is using the URLStreamHandler instance ofthe 1st URL as a lock for synchronizing read/write of hostAddress ofboth URLs being compared by equals(). But URLStreamHandler(s) of thetwo URL instances need not be the same instance even though they arefor the same protocol. Imagine:
URL url1 = new URL(null, "http://www.google.com/";, handler1);
URL url2 = new URL(null, "http://www.google.com/";, handler2);
...
thread1: url1.equals(url2);
thread2: url2.equals(url1);
Each thread could be using different instance of URLStreamHandler forsynchronization and could overwrite each other the cached hostAddresson individual URLs. These hostAddress values could be different in thepresence of dynamic changes to host name -> IP address mapping andcould therefore make URL.equals() unstable. This is admittedly a veryunlikely scenario, but is theoretically possible.
URL.sameHost() has exactly the same issues as URL.equals() as it onlymakes one field comparison less.
URL.openConnection() is a question in itself. It is delegated toURLStreamHandler. Some URLStreamHandlers make it synchronized andothers not. Those that make it synchronized (on the URLStreamHandlerinstance) do this for no apparent reason. This synchronization can'thelp make URL fields stable for the time of openConnection() callsince URL.set() is using a different lock (the URL instance itself).It only makes things worse since access to opening the connection tothe resources is serialized and this presents a bottleneck.
I tried to fix all these issues and came up with the following patchwhich I'm proposing:
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
New JDK8 synchronization primitive: StampedLock is a perfect tool forsolving these issues as synchronization in URL is only necessary toestablish visibility of initialized state which doesn't changeafterwards or changes at most once (when computing hashCode).StampedLock's tryOptimisticRead/validate is perfect for suchsituations as it only presents a negligible overhead of two volatilereads. The presented patch also contains an unrelated change whichreplaces usage of Hashtable with ConcurrentHashMap for holding themapping from protocol to individual URLStreamhandler which makes forbetter scallability of URL constructors. Combined with synchronizationscallability enhancements of InetAddress caching presented in anearlier proposal, the net result is much better scalability of URLconstructor/equals/hashCode (executed by a JMH test on a 4-corei7/Linux box):
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/URL.synchronization_bench_results.pdf
So with this patch java.net.URL could be treated as anunsafe-publication-tolerable class (like String). Well, not entirely,since getters for individual fields are still just unguarded normalreads. But those reads are usually performed under guard of aStampedLock when URLequals/hashCode/sameFile/openConnection/toExternalForm() operationsare delegated to URLStreamHandler and therefore don't suffer from(in)visibility of published state. Fields could be made volatile tofix this if desired.
I'm not entirely sure about why openConnection() in file: and mailto:URLStreamHandlers is synchronized. I don't see a reason, so I removedthe synchronization. If there is a reason, please bring it forward.
I ran the java/net jtreg tests with unpatched recent jdk9-dev andpatched (combined changes of URL and InetAddress caching) and get thesame score. Only the following 3 tests fail in both ocasions:
JT Harness : Tests that failed
java/net/MulticastSocket/Promiscuous.java: Test for interference whentwo sockets are bound to the same port but joined to differentmulticast groupsjava/net/MulticastSocket/SetLoopbackMode.java: TestMulticastSocket.setLoopbackModejava/net/MulticastSocket/Test.java: IPv4 and IPv6 multicasting brokenon Linux
They seem to not like my multicast configuration or something. Allother 258 tests pass.
So what do you think?


Regards, Peter

RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

Reply via email to