Re: RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

2014-07-25 Thread Peter Levart

Hi Chris,

Thanks for looking into this...

On 07/25/2014 02:53 PM, Chris Hegarty wrote:

Hi Peter,

This is certainly a thorny issue, and I agree with the approach of 
using StampedLock.


Some small comments / questions:

1) Why abuse fieldsLock in hostAddress(), rather than grabbing the
   writeLock ?


URL.hostAddress() is called-back from URLStreamHandler when URL is 
delegating work for URL.equals, URL.hashCode and URL.sameFile methods to 
it. These methods hold readLock (URL.hashCode can also hold writeLock) 
while calling URLStreamHandler to do the work. StampedLock is not 
reentrant. Instead of abusing fieldsLock object as a monitor lock for 
caching hostAddress(), a CAS can be used to set hostAddress and still 
provide a stable value (in presence of dynamic name service mapping 
changes).




2) Does the setAccessible in readObject need to be in a doPriv?
   Also should this throw an InternalError/AssertionError if it fails?


You're absolutely right! It needs to be in doPriv.

Here's new webrev that wraps setAccessible in doPriv and uses CAS 
instead of mutex for hostAddress caching.


http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.02/




I'll pop you patch into our internal build and test system.


Great. It would also be interesting to see if there are any performance 
differences. Do you have any performance tests at Oracle that include 
URL comparison? Differences might most obviously show-up when this patch 
is combined with InetAddress caching enhancements:


http://cr.openjdk.java.net/~plevart/jdk9-dev/InetAddress.Cache/webrev.03/


Regards, Peter



-Chris.

On 23/07/14 09:29, Peter Levart wrote:

I created an issue for this:

 https://bugs.openjdk.java.net/browse/JDK-8051713

The proposed patch is still the following:

http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/ 



Regards, Peter


On 07/11/2014 05:11 PM, Peter Levart wrote:

Hi,

java.net.URL is supposed to behave as an immutable object, so URL
instances can be shared among threads and among parts of code without
fear that they will be modified. URL class has an unusual way to
achieve this (or at least it tries to). Partly because of the design
which uses:

- URL constructor(s) that take a 'spec' String to be parsed into URL
object
- parsing is delegated to various URLStreamHandler(s) which are chosen
in the URL constructor (depending on the protocol used in URL string
to be parsed)

An unitialized URL instance (this) is passed from constructor to the
chosen URLStreamHandler which has the responsibility to parse the
string and set back the fields that hold various parts of URL object
being constructed. Consequently, these fields can not be declared as
final, as definite assignment analysis doesn't cross method borders.
It is therefore illegal to unsafely publish URL instances (via data
races) to non-constructing threads because they can appear not fully
initialized. Nevertheless URL, with the help of various
URLStreamHandler implementations, tries hard to make URL appear stable
at least where it is required to be stable. For example:
URL.hashCode() is (almost) stable even if URL instance is unsafely
published. This is achieved by making hashCode() synchronized and
cache the result. At least one way of constructing URLs - constructors
that take 'spec' String to be parsed - is also making sure that
hashCode is computed from fully initialized fields, as parsing is
delegated to URLStreamHandler which uses package-private URL.set()
method to set back the parsed fields and the set() method is also
synchronized. But making URL appear stable even though it is published
unsafely doesn't seem to be the primary concern of URL
synchronization. Other public URL constructors that take individual
URL parts and don't delegate parsing to URLStreamHandler but set
fields directly (not via set() method), are not synchronized.

Primary concern of synchronization in URL appears to be driven from
the fact that some URL operations like hasCode(), equals(), sameFile()
and openConnection() read multiple URL fields and URL.set() which can
be called from custom URLStreamHandler at any time (although this is
not it's purpose - it should only call-back while parsing/constructing
the URL) can set those fields. And those multi-field operations would
like to see a "snapshot" of field values that is consistent. But
synchronization that is performed to achieve that is questionable.
Might be that in Java 1.0 times the JVM implementation assumptions
were different and synchronization was correct, but nowadays Java
memory model makes them invalid.

URL.hasCode() apears to be the only method properly synchronized which
makes it almost stable (doesn't return different results over time)
but even hashCode() has a subtle bug or two. The initializer for
hashCode field sets it to value -1 which represents "not yet computed"
state. If URL is published unsafely, hashCode() method could see the
"default" value of 0, wh

Re: RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

2014-07-25 Thread Chris Hegarty

Hi Peter,

This is certainly a thorny issue, and I agree with the approach of using 
StampedLock.


Some small comments / questions:

1) Why abuse fieldsLock in hostAddress(), rather than grabbing the
   writeLock ?

2) Does the setAccessible in readObject need to be in a doPriv?
   Also should this throw an InternalError/AssertionError if it fails?

I'll pop you patch into our internal build and test system.

-Chris.

On 23/07/14 09:29, Peter Levart wrote:

I created an issue for this:

 https://bugs.openjdk.java.net/browse/JDK-8051713

The proposed patch is still the following:

http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/

Regards, Peter


On 07/11/2014 05:11 PM, Peter Levart wrote:

Hi,

java.net.URL is supposed to behave as an immutable object, so URL
instances can be shared among threads and among parts of code without
fear that they will be modified. URL class has an unusual way to
achieve this (or at least it tries to). Partly because of the design
which uses:

- URL constructor(s) that take a 'spec' String to be parsed into URL
object
- parsing is delegated to various URLStreamHandler(s) which are chosen
in the URL constructor (depending on the protocol used in URL string
to be parsed)

An unitialized URL instance (this) is passed from constructor to the
chosen URLStreamHandler which has the responsibility to parse the
string and set back the fields that hold various parts of URL object
being constructed. Consequently, these fields can not be declared as
final, as definite assignment analysis doesn't cross method borders.
It is therefore illegal to unsafely publish URL instances (via data
races) to non-constructing threads because they can appear not fully
initialized. Nevertheless URL, with the help of various
URLStreamHandler implementations, tries hard to make URL appear stable
at least where it is required to be stable. For example:
URL.hashCode() is (almost) stable even if URL instance is unsafely
published. This is achieved by making hashCode() synchronized and
cache the result. At least one way of constructing URLs - constructors
that take 'spec' String to be parsed - is also making sure that
hashCode is computed from fully initialized fields, as parsing is
delegated to URLStreamHandler which uses package-private URL.set()
method to set back the parsed fields and the set() method is also
synchronized. But making URL appear stable even though it is published
unsafely doesn't seem to be the primary concern of URL
synchronization. Other public URL constructors that take individual
URL parts and don't delegate parsing to URLStreamHandler but set
fields directly (not via set() method), are not synchronized.

Primary concern of synchronization in URL appears to be driven from
the fact that some URL operations like hasCode(), equals(), sameFile()
and openConnection() read multiple URL fields and URL.set() which can
be called from custom URLStreamHandler at any time (although this is
not it's purpose - it should only call-back while parsing/constructing
the URL) can set those fields. And those multi-field operations would
like to see a "snapshot" of field values that is consistent. But
synchronization that is performed to achieve that is questionable.
Might be that in Java 1.0 times the JVM implementation assumptions
were different and synchronization was correct, but nowadays Java
memory model makes them invalid.

URL.hasCode() apears to be the only method properly synchronized which
makes it almost stable (doesn't return different results over time)
but even hashCode() has a subtle bug or two. The initializer for
hashCode field sets it to value -1 which represents "not yet computed"
state. If URL is published unsafely, hashCode() method could see the
"default" value of 0, which would be returned. A later call to
hashCode() would see value -1 which would trigger computation and a
different value would be returned. The other subtle bug is a
relatively improbable event that hashCode computation results in value
-1 which means "not yet computed". This can be seen as performance
glitch (as hashCode will never be cached for such URL instance) or an
issue which makes hashCode unstable for one of the reasons why
equals() is unstable too (see below).

If URL.hasCode() method is almost stable (doesn't return different
results over time) and at least one way of URL construction makes sure
that hashCode is also calculated from fully initialized parts, then
URL.equals() and other methods that delegate to URLStreamHandler are a
special story. URL.equals() can't be synchronized on URL instance,
because it would have to be synchronized on both URL instances that
are being compared and this is prone do dead-locks. Imagine:

thread1: url1.equals(url2)
thread2: url2.equals(url1)

So equals() chooses to not be synchronized and therefore risks not
being stable if URL instances are published unsafely. But it
nevertheless uses synchronization. equals() delegates it's work to the
1st 

RFR: JDK-8051713 - URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

2014-07-23 Thread Peter Levart

I created an issue for this:

https://bugs.openjdk.java.net/browse/JDK-8051713

The proposed patch is still the following:

http://cr.openjdk.java.net/~plevart/jdk9-dev/URL.synchronization/webrev.01/

Regards, Peter


On 07/11/2014 05:11 PM, Peter Levart wrote:

Hi,

java.net.URL is supposed to behave as an immutable object, so URL 
instances can be shared among threads and among parts of code without 
fear that they will be modified. URL class has an unusual way to 
achieve this (or at least it tries to). Partly because of the design 
which uses:


- URL constructor(s) that take a 'spec' String to be parsed into URL 
object
- parsing is delegated to various URLStreamHandler(s) which are chosen 
in the URL constructor (depending on the protocol used in URL string 
to be parsed)


An unitialized URL instance (this) is passed from constructor to the 
chosen URLStreamHandler which has the responsibility to parse the 
string and set back the fields that hold various parts of URL object 
being constructed. Consequently, these fields can not be declared as 
final, as definite assignment analysis doesn't cross method borders. 
It is therefore illegal to unsafely publish URL instances (via data 
races) to non-constructing threads because they can appear not fully 
initialized. Nevertheless URL, with the help of various 
URLStreamHandler implementations, tries hard to make URL appear stable 
at least where it is required to be stable. For example: 
URL.hashCode() is (almost) stable even if URL instance is unsafely 
published. This is achieved by making hashCode() synchronized and 
cache the result. At least one way of constructing URLs - constructors 
that take 'spec' String to be parsed - is also making sure that 
hashCode is computed from fully initialized fields, as parsing is 
delegated to URLStreamHandler which uses package-private URL.set() 
method to set back the parsed fields and the set() method is also 
synchronized. But making URL appear stable even though it is published 
unsafely doesn't seem to be the primary concern of URL 
synchronization. Other public URL constructors that take individual 
URL parts and don't delegate parsing to URLStreamHandler but set 
fields directly (not via set() method), are not synchronized.


Primary concern of synchronization in URL appears to be driven from 
the fact that some URL operations like hasCode(), equals(), sameFile() 
and openConnection() read multiple URL fields and URL.set() which can 
be called from custom URLStreamHandler at any time (although this is 
not it's purpose - it should only call-back while parsing/constructing 
the URL) can set those fields. And those multi-field operations would 
like to see a "snapshot" of field values that is consistent. But 
synchronization that is performed to achieve that is questionable. 
Might be that in Java 1.0 times the JVM implementation assumptions 
were different and synchronization was correct, but nowadays Java 
memory model makes them invalid.


URL.hasCode() apears to be the only method properly synchronized which 
makes it almost stable (doesn't return different results over time) 
but even hashCode() has a subtle bug or two. The initializer for 
hashCode field sets it to value -1 which represents "not yet computed" 
state. If URL is published unsafely, hashCode() method could see the 
"default" value of 0, which would be returned. A later call to 
hashCode() would see value -1 which would trigger computation and a 
different value would be returned. The other subtle bug is a 
relatively improbable event that hashCode computation results in value 
-1 which means "not yet computed". This can be seen as performance 
glitch (as hashCode will never be cached for such URL instance) or an 
issue which makes hashCode unstable for one of the reasons why 
equals() is unstable too (see below).


If URL.hasCode() method is almost stable (doesn't return different 
results over time) and at least one way of URL construction makes sure 
that hashCode is also calculated from fully initialized parts, then 
URL.equals() and other methods that delegate to URLStreamHandler are a 
special story. URL.equals() can't be synchronized on URL instance, 
because it would have to be synchronized on both URL instances that 
are being compared and this is prone do dead-locks. Imagine:


thread1: url1.equals(url2)
thread2: url2.equals(url1)

So equals() chooses to not be synchronized and therefore risks not 
being stable if URL instances are published unsafely. But it 
nevertheless uses synchronization. equals() delegates it's work to the 
1st URL's URLStreamHandler which synchronizes on itself when 
calculating and caching the InetAddress of each individual URL's host 
name. InetAddress (if resolvable) is used in preference to host name 
for comparison (and also in hashCode()). URL.equals() risks not being 
stable for the following reasons:


- URL instances published unsafely can appear not fully initialized to 
equals() even though 

URL constructor/equals/hashCode/sameFile/openConnection synchronization issues

2014-07-11 Thread Peter Levart

Hi,

java.net.URL is supposed to behave as an immutable object, so URL 
instances can be shared among threads and among parts of code without 
fear that they will be modified. URL class has an unusual way to achieve 
this (or at least it tries to). Partly because of the design which uses:


- URL constructor(s) that take a 'spec' String to be parsed into URL object
- parsing is delegated to various URLStreamHandler(s) which are chosen 
in the URL constructor (depending on the protocol used in URL string to 
be parsed)


An unitialized URL instance (this) is passed from constructor to the 
chosen URLStreamHandler which has the responsibility to parse the string 
and set back the fields that hold various parts of URL object being 
constructed. Consequently, these fields can not be declared as final, as 
definite assignment analysis doesn't cross method borders. It is 
therefore illegal to unsafely publish URL instances (via data races) to 
non-constructing threads because they can appear not fully initialized. 
Nevertheless URL, with the help of various URLStreamHandler 
implementations, tries hard to make URL appear stable at least where it 
is required to be stable. For example: URL.hashCode() is (almost) stable 
even if URL instance is unsafely published. This is achieved by making 
hashCode() synchronized and cache the result. At least one way of 
constructing URLs - constructors that take 'spec' String to be parsed - 
is also making sure that hashCode is computed from fully initialized 
fields, as parsing is delegated to URLStreamHandler which uses 
package-private URL.set() method to set back the parsed fields and the 
set() method is also synchronized. But making URL appear stable even 
though it is published unsafely doesn't seem to be the primary concern 
of URL synchronization. Other public URL constructors that take 
individual URL parts and don't delegate parsing to URLStreamHandler but 
set fields directly (not via set() method), are not synchronized.


Primary concern of synchronization in URL appears to be driven from the 
fact that some URL operations like hasCode(), equals(), sameFile() and 
openConnection() read multiple URL fields and URL.set() which can be 
called from custom URLStreamHandler at any time (although this is not 
it's purpose - it should only call-back while parsing/constructing the 
URL) can set those fields. And those multi-field operations would like 
to see a "snapshot" of field values that is consistent. But 
synchronization that is performed to achieve that is questionable. Might 
be that in Java 1.0 times the JVM implementation assumptions were 
different and synchronization was correct, but nowadays Java memory 
model makes them invalid.


URL.hasCode() apears to be the only method properly synchronized which 
makes it almost stable (doesn't return different results over time) but 
even hashCode() has a subtle bug or two. The initializer for hashCode 
field sets it to value -1 which represents "not yet computed" state. If 
URL is published unsafely, hashCode() method could see the "default" 
value of 0, which would be returned. A later call to hashCode() would 
see value -1 which would trigger computation and a different value would 
be returned. The other subtle bug is a relatively improbable event that 
hashCode computation results in value -1 which means "not yet computed". 
This can be seen as performance glitch (as hashCode will never be cached 
for such URL instance) or an issue which makes hashCode unstable for one 
of the reasons why equals() is unstable too (see below).


If URL.hasCode() method is almost stable (doesn't return different 
results over time) and at least one way of URL construction makes sure 
that hashCode is also calculated from fully initialized parts, then 
URL.equals() and other methods that delegate to URLStreamHandler are a 
special story. URL.equals() can't be synchronized on URL instance, 
because it would have to be synchronized on both URL instances that are 
being compared and this is prone do dead-locks. Imagine:


thread1: url1.equals(url2)
thread2: url2.equals(url1)

So equals() chooses to not be synchronized and therefore risks not being 
stable if URL instances are published unsafely. But it nevertheless uses 
synchronization. equals() delegates it's work to the 1st URL's 
URLStreamHandler which synchronizes on itself when calculating and 
caching the InetAddress of each individual URL's host name. InetAddress 
(if resolvable) is used in preference to host name for comparison (and 
also in hashCode()). URL.equals() risks not being stable for the 
following reasons:


- URL instances published unsafely can appear not fully initialized to 
equals() even though they were constructed with constructors that 
delegate parsing to URLStreamHandler(s) which use synchronized URL.set() 
to set the fields, because URL.equals() is not synchronized.


- URL.hostAddress that is calculated on demand and then cached on the 
URL instance shou