Gluing together URL.equals

Peter Levart Thu, 03 Jul 2014 09:03:14 -0700

Hi,

We know that URL.equals and hashCode are fundamentally broken. ButURL.equals is even more broken than hashCode. Nevertheless, URL.equalsis used explicitly in the following places in JDK:


java.security.CodeSource.matchLocation
java.security.CodeSource.equals
java.util.jar.JarVerifier.VerifierCodeSource.equals
javax.sql.rowset.serial.SerialDatalink.equals
java.lang.Package.isSealed
javax.swing.JEditorPane.setPage
javax.swing.text.html.FrameView.changedUpdate
sun.applet.AppletViewer.getApplet
sun.applet.AppletViewer.getApplets

And I'm not counting places where it might be used because URLs areObjects (as keys in HashMaps, etc...)

I'd like to discuss one of URL.equals pitfalls that might be able to getfixed and whether it is desirable to fix it.

javadoc: "The equals method implements an equivalence relation onnon-null object references:

...

It is consistent: for any non-null reference values x and y, multipleinvocations of x.equals(y) consistently return true or consistentlyreturn false, provided no information used in equals comparisons on theobjects is modified."



URL url1 = new URL("http://alias1/";);
URL url2 = new URL("http://alias2/";);

boolean answer1 = url1.equals(url2);
...
boolean answer2 = url1.equals(url2);

Can it happen that answer1 != answer2 ?

Yes! Suppose that alias1 and alias2 are host names that resolve to thesame IP address. Normally, answer1 and answer2 would be "true". But onlyif the name service that resolves the host names is up and running. Ifit's not, then the answer is "false". Suppose that while obtaininganswer1 the DNS was restarting and while obtaining answer2 it was up andrunning... Then answer1 would be "false" while answer2 would be "true".The following URLStreamHandler method that is called for both URLs fromequals method is responsible for such unstable behaviour:


    protected synchronized InetAddress getHostAddress(URL u) {
        if (u.hostAddress != null)
            return u.hostAddress;

        String host = u.getHost();
        if (host == null || host.equals("")) {
            return null;
        } else {
            try {
                u.hostAddress = InetAddress.getByName(host);
            } catch (UnknownHostException ex) {
                return null;
            } catch (SecurityException se) {
                return null;
            }
        }
        return u.hostAddress;
    }

As can be seen, the hostAddress is obtained by InetAddress.getByName()and then cached on the URL.hostAddress field. Leaving aside the factthat although this method is synchronized, caching of hostAddress is notsynchronized properly (more on that later), the problem is that negativeanswer (UnknownHostException or SecurityException) is not cached.UnknownHostException is not cached by InetAddress.getByName() by defaultand SecurityException is dependent on the caller SecurityContext. Simplefix for this issue would be to cache negative answer on the URL fieldtoo. This would make URL.equals "consistent".

So what's wrong with synchronization besides being a bottleneck? Theproblem is that getHostAddress() method is using the URLStreamHandlerinstance as a lock. Two URLs that are compared in the URL.equals methodare passed to the URLStreamHandler.equals(URL u1, URL u2) method of the1st URL's handler. This handler instance need not be the same as the 2ndURL's handler even though both URLs have same protocol. For example:


URL url1 = new URL("http://alias1/";);
URL.setURLStreamHandlerFactory(...a custom factory...);
URL url2 = new URL("http://alias2/";);

The "handler" instances of above two URLs are different, since thehandler of 1st URL was created with default URLStreamHandlerFactory andthe handler of 2nd URL was created with a customURLStreamHandlerFactory. Now suppose one thread does:


url1.equals(url2);

and some other thread does:

url2.equals(url1);

This translates to, among other things, calling the followingURLStreamHandler instance method:


    protected boolean hostsEqual(URL u1, URL u2) {
        InetAddress a1 = getHostAddress(u1);
        InetAddress a2 = getHostAddress(u2);
        // if we have internet address for both, compare them
        if (a1 != null && a2 != null) {
            return a1.equals(a2);
        // else, if both have host names, compare them
        } else if (u1.getHost() != null && u2.getHost() != null)
            return u1.getHost().equalsIgnoreCase(u2.getHost());
         else
            return u1.getHost() == null && u2.getHost() == null;
    }

So the two threads are reading and modifying URL.hostAddress field ofboth URLs, but each of them is holding a separate lock. You may say thatcreating URL instances, then changing the URLStreamHandlerFactory andcreating some more URL instances and than comparing them amongthemselves is not happening a lot, but this could be fixed. Why notusing the URL instance as a lock when reading/writing it's field? Wouldthis be desirable? It would mean a lot less contention (and even less ifcaching of URL.hostAddress was implemented in a lock-free way).

Because I know that URL.equals compatibility is important, I'm askinghere if a fix for this issue is desirable at all. What aboutsynchronization fix only (and keeping the "unstable" equals() behaviour)?


Regards, Peter

Gluing together URL.equals

Reply via email to