The File-URI codec on unix encodes \foo\bar --> %5Cfoo%5Cbar This is to be interpreted as file or dir named \foo\bar If you send this uri to jvm on windows you get new File(new URI(uriStr)) which is interpreted as file or dir bar under dir foo which is under root.
So it seems that %5C is not interpreted as having special meaning
but on windows it is. The other alternative on windows would be to throw
exception because a file with the given path can't be created.
So it is thought that it makes life easier if the %5C is interpreted as path separator on
windows.
The same question applies to . (dot = current dir) double dot ( = parent dir) and any other characters that we might want to assign some special meaning to ( eg. ~ tilde)
"When do we interpret a special charater to have it's special meaning and how do we escape away that special meaning?"
Well the answer is so simple and according to what you think is right. %xx notation ESCAPES the character and NEGATES the possible special meaning it might have.
So therefore I think it would be more correct if %5Cfoo%5Cbar on windows would throw an exception.
And your intuition is correct.
But note: If I have a path ../xtc then the corresponding uri should be ../xtc. Because in this case we want the dots to have their special meaning.
But what if % character would have a special meaning (let's imagine it points
to the parent of the parent if one exists or else to root)
Then path %/xtc should be uri %/xtc BUT this is not possible because % has a special
meaning in URI as escape character.
All the other excluded characters MUST be encoded because of URI spec.
The reasons being eg. that uri could be printed on paper and new line characters
would be hard to read if they were not escaped.
So let's recap the excluded character list ctrl-chars | space | "<" | ">" | "#" | "%" | <">
None of these have any special meaning in any filesystems Thus we are saved.
Rest of the encodings are because of the schema specific rules and serve the purpose of escaping the schema specific meaning of the character.
Therefore the uri corresponding the path @foo/%bar/+xtc should be @foo/%25bar/+xtc
Do these thoughts clarify ? :-)
- rami
Hello!
Sounds like a long night today :-)Hard work - it might take some time until I can commit the new naming stuff.
The whole procedure of parsing a uri needs to be refactored, currently I fight agains the "Layered" stuff e.g. tar:tar:file:/dir/first.tar!/second.tar!/entry
And I already "implemented" some incompatibilites between the old and the new VFS naming:
Current: file = getManager().resolveFile("%2e"); resolves to the current Directory New: resolves to a file or directory NAMED "."
Current: file = getManager().resolveFile("dir%2fchild"); resolves to a file "child" in directory "dir" New: resolves to a file or directory named "dir/child"
Current: file = getManager().resolveFile("dir%5cchild"); resolves to a file "child" in directory "dir" New: resolves to a file or directory named "dir\child"
I leave it up to the filesystem if such a file or directory could be created.
The above examples are those from the unit-test, so the old behaviour was wanted. But I think the new one is the right one.
I think it is very unlikely that those constructs can be found in the wild life, but if one used VFS that way it IS broken.
Any comments?
--- Mario
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]