Hi,

6) URLs and Filesystem Paths

URLs and filesystem paths are really two different beasts and converting between them is not trivial. The main source of problems is that different encoding rules apply for the strings that make up a URL or filesystem path.

For example, consider the following code snippet:

 File file = new File( "foo bar+foo" );
 URL url = file.toURI().toURL();
 System.out.println( file.toURL() );
 System.out.println( url );
 System.out.println( url.getPath() );
 System.out.println( URLDecoder.decode( url.getPath(), "UTF-8" ) );

which outputs something like

 file:/M:/scratch-pad/foo bar+foo
 file:/M:/scratch-pad/foo%20bar+foo
 /M:/scratch-pad/foo%20bar+foo
 /M:/scratch-pad/foo bar foo

First of all, please note that File.toURL() [1] does not escape the space character. This yields an invalid URL, as per RFC 2396 [0], section 2.4.3 "Excluded US-ASCII Characters". The class java.net.URL will silently accept such invalid URLs, in contrast java.net.URI will not (see also URL.toURI() [2]). For this reason, this API method has already been deprecated and should be replaced with File.toURI().toURL().

Next, URL.getPath() does in general not return a string that can be used as a filesystem path. It returns a substring of the URL and as such can contain escape sequences. The prominent example is the space character which will show up as "%20". People sometimes hack around this by means of replace("%20", " ") but that does simply not cover all cases. It's worth to mention that on the other hand the related method URI.getPath() [3] does decode escapes but still the result is not a filesystem path (compare the source for the constructor File(URI)).

To decode a URL, people sometimes also choose java.net.URLDecoder [4]. The pitfall with this class is that is actually performs HTML form decoding which is yet another encoding and not the same as the URL encoding (compare last paragraph in class javadoc about java.net.URL). For instance, a URLDecoder will errorneously convert the character "+" into a space as illustrated by the last sysout in the example above.

Code targetting JRE 1.4+ should easily avoid these problems by using

 new File( new URI( url.toString() ) )

when converting a URL to a filesystem path and

 file.toURI().toURL()

when converting back.

Regards,


Benjamin Bentmann


[0] http://www.faqs.org/rfcs/rfc2396.html
[1] http://java.sun.com/javase/6/docs/api/java/io/File.html#toURL()
[2] http://java.sun.com/javase/6/docs/api/java/net/URL.html#toURI()
[3] http://java.sun.com/javase/6/docs/api/java/net/URI.html#getPath()
[4] http://java.sun.com/javase/6/docs/api/java/net/URLDecoder.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to