I created an issue for this:
https://bugs.openjdk.java.net/browse/JDK-8051713
The proposed patch is still the following:
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
Regards, Peter
On 07/11/2014 05:11 PM, Peter Levart wrote:
Hi,
java.net.URL is supposed to behave as an immutable object, so URL
instances can be shared among threads and among parts of code without
fear that they will be modified. URL class has an unusual way to
achieve this (or at least it tries to). Partly because of the design
which uses:
- URL constructor(s) that take a 'spec' String to be parsed into URL
object
- parsing is delegated to various URLStreamHandler(s) which are chosen
in the URL constructor (depending on the protocol used in URL string
to be parsed)
An unitialized URL instance (this) is passed from constructor to the
chosen URLStreamHandler which has the responsibility to parse the
string and set back the fields that hold various parts of URL object
being constructed. Consequently, these fields can not be declared as
final, as definite assignment analysis doesn't cross method borders.
It is therefore illegal to unsafely publish URL instances (via data
races) to non-constructing threads because they can appear not fully
initialized. Nevertheless URL, with the help of various
URLStreamHandler implementations, tries hard to make URL appear stable
at least where it is required to be stable. For example:
URL.hashCode() is (almost) stable even if URL instance is unsafely
published. This is achieved by making hashCode() synchronized and
cache the result. At least one way of constructing URLs - constructors
that take 'spec' String to be parsed - is also making sure that
hashCode is computed from fully initialized fields, as parsing is
delegated to URLStreamHandler which uses package-private URL.set()
method to set back the parsed fields and the set() method is also
synchronized. But making URL appear stable even though it is published
unsafely doesn't seem to be the primary concern of URL
synchronization. Other public URL constructors that take individual
URL parts and don't delegate parsing to URLStreamHandler but set
fields directly (not via set() method), are not synchronized.
Primary concern of synchronization in URL appears to be driven from
the fact that some URL operations like hasCode(), equals(), sameFile()
and openConnection() read multiple URL fields and URL.set() which can
be called from custom URLStreamHandler at any time (although this is
not it's purpose - it should only call-back while parsing/constructing
the URL) can set those fields. And those multi-field operations would
like to see a "snapshot" of field values that is consistent. But
synchronization that is performed to achieve that is questionable.
Might be that in Java 1.0 times the JVM implementation assumptions
were different and synchronization was correct, but nowadays Java
memory model makes them invalid.
URL.hasCode() apears to be the only method properly synchronized which
makes it almost stable (doesn't return different results over time)
but even hashCode() has a subtle bug or two. The initializer for
hashCode field sets it to value -1 which represents "not yet computed"
state. If URL is published unsafely, hashCode() method could see the
"default" value of 0, which would be returned. A later call to
hashCode() would see value -1 which would trigger computation and a
different value would be returned. The other subtle bug is a
relatively improbable event that hashCode computation results in value
-1 which means "not yet computed". This can be seen as performance
glitch (as hashCode will never be cached for such URL instance) or an
issue which makes hashCode unstable for one of the reasons why
equals() is unstable too (see below).
If URL.hasCode() method is almost stable (doesn't return different
results over time) and at least one way of URL construction makes sure
that hashCode is also calculated from fully initialized parts, then
URL.equals() and other methods that delegate to URLStreamHandler are a
special story. URL.equals() can't be synchronized on URL instance,
because it would have to be synchronized on both URL instances that
are being compared and this is prone do dead-locks. Imagine:
thread1: url1.equals(url2)
thread2: url2.equals(url1)
So equals() chooses to not be synchronized and therefore risks not
being stable if URL instances are published unsafely. But it
nevertheless uses synchronization. equals() delegates it's work to the
1st URL's URLStreamHandler which synchronizes on itself when
calculating and caching the InetAddress of each individual URL's host
name. InetAddress (if resolvable) is used in preference to host name
for comparison (and also in hashCode()). URL.equals() risks not being
stable for the following reasons:
- URL instances published unsafely can appear not fully initialized to
equals() even though they were constructed with constructors that
delegate parsing to URLStreamHandler(s) which use synchronized
URL.set() to set the fields, because URL.equals() is not synchronized.
- URL.hostAddress that is calculated on demand and then cached on the
URL instance should help make equals() stable in the presence of
dynamic changes to host name -> IP address mapping, but caching is not
performed for unsuccessful resolving. Temporary name service outage
can make URL.equals() unstable.
- URL.hostAddress caching is using the URLStreamHandler instance of
the 1st URL as a lock for synchronizing read/write of hostAddress of
both URLs being compared by equals(). But URLStreamHandler(s) of the
two URL instances need not be the same instance even though they are
for the same protocol. Imagine:
URL url1 = new URL(null, "http://www.google.com/", handler1);
URL url2 = new URL(null, "http://www.google.com/", handler2);
...
thread1: url1.equals(url2);
thread2: url2.equals(url1);
Each thread could be using different instance of URLStreamHandler for
synchronization and could overwrite each other the cached hostAddress
on individual URLs. These hostAddress values could be different in the
presence of dynamic changes to host name -> IP address mapping and
could therefore make URL.equals() unstable. This is admittedly a very
unlikely scenario, but is theoretically possible.
URL.sameHost() has exactly the same issues as URL.equals() as it only
makes one field comparison less.
URL.openConnection() is a question in itself. It is delegated to
URLStreamHandler. Some URLStreamHandlers make it synchronized and
others not. Those that make it synchronized (on the URLStreamHandler
instance) do this for no apparent reason. This synchronization can't
help make URL fields stable for the time of openConnection() call
since URL.set() is using a different lock (the URL instance itself).
It only makes things worse since access to opening the connection to
the resources is serialized and this presents a bottleneck.
I tried to fix all these issues and came up with the following patch
which I'm proposing:
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/
New JDK8 synchronization primitive: StampedLock is a perfect tool for
solving these issues as synchronization in URL is only necessary to
establish visibility of initialized state which doesn't change
afterwards or changes at most once (when computing hashCode).
StampedLock's tryOptimisticRead/validate is perfect for such
situations as it only presents a negligible overhead of two volatile
reads. The presented patch also contains an unrelated change which
replaces usage of Hashtable with ConcurrentHashMap for holding the
mapping from protocol to individual URLStreamhandler which makes for
better scallability of URL constructors. Combined with synchronization
scallability enhancements of InetAddress caching presented in an
earlier proposal, the net result is much better scalability of URL
constructor/equals/hashCode (executed by a JMH test on a 4-core
i7/Linux box):
http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/URL.synchronization_bench_results.pdf
So with this patch java.net.URL could be treated as an
unsafe-publication-tolerable class (like String). Well, not entirely,
since getters for individual fields are still just unguarded normal
reads. But those reads are usually performed under guard of a
StampedLock when URL
equals/hashCode/sameFile/openConnection/toExternalForm() operations
are delegated to URLStreamHandler and therefore don't suffer from
(in)visibility of published state. Fields could be made volatile to
fix this if desired.
I'm not entirely sure about why openConnection() in file: and mailto:
URLStreamHandlers is synchronized. I don't see a reason, so I removed
the synchronization. If there is a reason, please bring it forward.
I ran the java/net jtreg tests with unpatched recent jdk9-dev and
patched (combined changes of URL and InetAddress caching) and get the
same score. Only the following 3 tests fail in both ocasions:
JT Harness : Tests that failed
java/net/MulticastSocket/Promiscuous.java: Test for interference when
two sockets are bound to the same port but joined to different
multicast groups
java/net/MulticastSocket/SetLoopbackMode.java: Test
MulticastSocket.setLoopbackMode
java/net/MulticastSocket/Test.java: IPv4 and IPv6 multicasting broken
on Linux
They seem to not like my multicast configuration or something. All
other 258 tests pass.
So what do you think?
Regards, Peter