The File-URI codec on unix encodes
\foo\bar --> %5Cfoo%5Cbar
This is to be interpreted as file or dir named \foo\bar
If you send this uri to jvm on windows you get
new File(new URI(uriStr))
which is interpreted as file or dir bar under dir foo which is under root.

So it seems that %5C is not interpreted as having special meaning
but on windows it is. The other alternative on windows would be to throw
exception because a file with the given path can't be created.
So it is thought that it makes life easier if the %5C is interpreted as path separator on
windows.


The same question applies to . (dot = current dir)
double dot ( = parent dir) and any other characters
that we might want to assign some special meaning to ( eg. ~ tilde)

"When do we interpret a special charater to have it's special meaning
and how do we escape away that special meaning?"

Well the answer is so simple and according to what you think is right.
%xx notation ESCAPES the character and NEGATES the possible special
meaning it might have.

So therefore I think it would be more correct if %5Cfoo%5Cbar on windows
would throw an exception.

And your intuition is correct.

But note: If I have a path ../xtc then the corresponding uri should be
../xtc. Because in this case we want the dots to have their special meaning.

But what if % character would have a special meaning (let's imagine it points
to the parent of the parent if one exists or else to root)
Then path %/xtc should be uri %/xtc BUT this is not possible because % has a special
meaning in URI as escape character.


All the other excluded characters MUST be encoded because of URI spec.
The reasons being eg. that uri could be printed on paper and new line characters
would be hard to read if they were not escaped.


So let's recap the excluded character list
ctrl-chars | space | "<" | ">" | "#" | "%" | <">

None of these have any special meaning in any filesystems
Thus we are saved.

Rest of the encodings are because of the schema specific rules
and serve the purpose of escaping the schema specific meaning
of the character.

Therefore the uri corresponding the path @foo/%bar/+xtc should be @foo/%25bar/+xtc

Do these thoughts clarify ? :-)

- rami

Hello!


Sounds like a long night today :-)


Hard work - it might take some time until I can commit the new naming stuff.
The whole procedure of parsing a uri needs to be refactored, currently I fight agains the "Layered" stuff e.g. tar:tar:file:/dir/first.tar!/second.tar!/entry


And I already "implemented" some incompatibilites between the old and the new VFS naming:

Current:
       file = getManager().resolveFile("%2e");
resolves to the current Directory
New:
resolves to a file or directory NAMED "."

Current:
       file = getManager().resolveFile("dir%2fchild");
resolves to a file "child" in directory "dir"
New:
resolves to a file or directory named "dir/child"

Current:
       file = getManager().resolveFile("dir%5cchild");
resolves to a file "child" in directory "dir"
New:
resolves to a file or directory named "dir\child"

I leave it up to the filesystem if such a file or directory could be created.

The above examples are those from the unit-test, so the old behaviour was wanted. But I think the new one is the right one.
I think it is very unlikely that those constructs can be found in the wild life, but if one used VFS that way it IS broken.


Any comments?

---
Mario


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to