The java.io.File has a toURI() API which returns a (system dependent) URI as per its javadoc. The java.io.File also has a toPath() API which then exposes a toUri() API from the java.nio.file.Path. The javadoc of the File class doesn't specify any semantics about the toUri() returned by the java.nio.file.Path.

More specifically, for the same file instance, is the File.toURI() and Path.toUri() expected to return a URI which has the same semantics when it comes to encoded characters in the URI?

Consider the following trivial code for what I mean:

import java.net.*;
import java.nio.file.*;
import java.io.*;

public class PathTest {

    public static void main(final String[] args) throws Exception {
        final String location = args[0];

        final Path a = Paths.get(location).toAbsolutePath();
        System.out.println("URI from Paths.get().toUri() API " + a + " ---> " + a.toUri());

        final Path b = new File(location).toPath().toAbsolutePath();
        System.out.println("URI from File.toPath().toUri() API " + b + " ---> " + b.toUri());

        final File c = new File(location).getAbsoluteFile();
        System.out.println("URI from File.toURI() API " + c + " ---> " + c.toURI());

    }
}

The above program prints the URI of a file path using 3 different APIs:

1. Paths.get().toUri()
2. File.toPath().toUri()
3. File.toURI()

When I run the program and pass it a directory which contains a non-ascii character (which belongs to the "other" category as explained in the URI javadoc[1]) then I see that the URI returned by the Path.toUri() differs from the URI returned from the File.toURI() when it comes to encoding the "other" category character (i.e. the non-ascii character). Here's the command I use and here's the output:

mkdir foobãr
java PathTest foobãr

Output:

URI from Paths.get().toUri() API /private/tmp/delme/foobãr ---> file:///private/tmp/delme/fooba%CC%83r/ URI from File.toPath().toUri() API /private/tmp/delme/foobãr ---> file:///private/tmp/delme/fooba%CC%83r/ URI from File.toURI() API /private/tmp/delme/foobãr ---> file:/private/tmp/delme/foobãr/

Notice that the Path.toUri() version encodes the non-ascii characters whereas the File.toURI() doesn't. Is this expected? The javadoc doesn't have much details around this.

Now, interestingly, the same program if passed a file path which contains a "illegal" character (for example space character as defined in[1]), then both the Path.toUri() and File.toURI() return a URI which has the character encoded. Here's the output when you run:

mkdir "foo bar"
java PathTest "foo bar"

Output:

URI from Paths.get().toUri() API /private/tmp/delme/foo bar ---> file:///private/tmp/delme/foo%20bar URI from File.toPath().toUri() API /private/tmp/delme/foo bar ---> file:///private/tmp/delme/foo%20bar URI from File.toURI() API /private/tmp/delme/foo bar ---> file:/private/tmp/delme/foo%20bar

So it's not clear which categories of the characters will be (consistently) encoded by the URI returned by the Path and File instances, for the same target file.

[1] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/URI.html


-Jaikiran

Reply via email to