Hi,

We know that URL.equals and hashCode are fundamentally broken. But URL.equals is even more broken than hashCode. Nevertheless, URL.equals is used explicitly in the following places in JDK:

java.security.CodeSource.matchLocation
java.security.CodeSource.equals
java.util.jar.JarVerifier.VerifierCodeSource.equals
javax.sql.rowset.serial.SerialDatalink.equals
java.lang.Package.isSealed
javax.swing.JEditorPane.setPage
javax.swing.text.html.FrameView.changedUpdate
sun.applet.AppletViewer.getApplet
sun.applet.AppletViewer.getApplets

And I'm not counting places where it might be used because URLs are Objects (as keys in HashMaps, etc...)

I'd like to discuss one of URL.equals pitfalls that might be able to get fixed and whether it is desirable to fix it.

javadoc: "The equals method implements an equivalence relation on non-null object references:
...
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified."


URL url1 = new URL("http://alias1/";);
URL url2 = new URL("http://alias2/";);

boolean answer1 = url1.equals(url2);
...
boolean answer2 = url1.equals(url2);

Can it happen that answer1 != answer2 ?
Yes! Suppose that alias1 and alias2 are host names that resolve to the same IP address. Normally, answer1 and answer2 would be "true". But only if the name service that resolves the host names is up and running. If it's not, then the answer is "false". Suppose that while obtaining answer1 the DNS was restarting and while obtaining answer2 it was up and running... Then answer1 would be "false" while answer2 would be "true". The following URLStreamHandler method that is called for both URLs from equals method is responsible for such unstable behaviour:

    protected synchronized InetAddress getHostAddress(URL u) {
        if (u.hostAddress != null)
            return u.hostAddress;

        String host = u.getHost();
        if (host == null || host.equals("")) {
            return null;
        } else {
            try {
                u.hostAddress = InetAddress.getByName(host);
            } catch (UnknownHostException ex) {
                return null;
            } catch (SecurityException se) {
                return null;
            }
        }
        return u.hostAddress;
    }

As can be seen, the hostAddress is obtained by InetAddress.getByName() and then cached on the URL.hostAddress field. Leaving aside the fact that although this method is synchronized, caching of hostAddress is not synchronized properly (more on that later), the problem is that negative answer (UnknownHostException or SecurityException) is not cached. UnknownHostException is not cached by InetAddress.getByName() by default and SecurityException is dependent on the caller SecurityContext. Simple fix for this issue would be to cache negative answer on the URL field too. This would make URL.equals "consistent".

So what's wrong with synchronization besides being a bottleneck? The problem is that getHostAddress() method is using the URLStreamHandler instance as a lock. Two URLs that are compared in the URL.equals method are passed to the URLStreamHandler.equals(URL u1, URL u2) method of the 1st URL's handler. This handler instance need not be the same as the 2nd URL's handler even though both URLs have same protocol. For example:

URL url1 = new URL("http://alias1/";);
URL.setURLStreamHandlerFactory(...a custom factory...);
URL url2 = new URL("http://alias2/";);

The "handler" instances of above two URLs are different, since the handler of 1st URL was created with default URLStreamHandlerFactory and the handler of 2nd URL was created with a custom URLStreamHandlerFactory. Now suppose one thread does:

url1.equals(url2);

and some other thread does:

url2.equals(url1);

This translates to, among other things, calling the following URLStreamHandler instance method:

    protected boolean hostsEqual(URL u1, URL u2) {
        InetAddress a1 = getHostAddress(u1);
        InetAddress a2 = getHostAddress(u2);
        // if we have internet address for both, compare them
        if (a1 != null && a2 != null) {
            return a1.equals(a2);
        // else, if both have host names, compare them
        } else if (u1.getHost() != null && u2.getHost() != null)
            return u1.getHost().equalsIgnoreCase(u2.getHost());
         else
            return u1.getHost() == null && u2.getHost() == null;
    }

So the two threads are reading and modifying URL.hostAddress field of both URLs, but each of them is holding a separate lock. You may say that creating URL instances, then changing the URLStreamHandlerFactory and creating some more URL instances and than comparing them among themselves is not happening a lot, but this could be fixed. Why not using the URL instance as a lock when reading/writing it's field? Would this be desirable? It would mean a lot less contention (and even less if caching of URL.hostAddress was implemented in a lock-free way).

Because I know that URL.equals compatibility is important, I'm asking here if a fix for this issue is desirable at all. What about synchronization fix only (and keeping the "unstable" equals() behaviour)?

Regards, Peter

Reply via email to