URI spec dudes talk about canonical form of the URI.
This is left for the schema to define.
Now if vfs is in control of the uri's that come in and go out
then it would be possible to canonicalize the URI when it enters
the core areas of vfs that is not provider (schema) specific.
Cache I believe in that core area.
Let's say someone points to a file with URI
webdav:/anydir/%74est%0d.txt
This is canonicalized into
webdav:/anydir/test%0D.txt
So when someone points next time to uri
webdav:/anydir/tes%74%0D.txt
then he will get the cached file.

Note: Canonicalization could be provider
specific so that different schemas could
escape different set of characters.

What do you think?

Hello!


Now that it is possible to safely pass uris we could have a look how we should encode.
I will try to figure out how local-file, ftp, http, webdav, smb, sft will handle filenames with special characters.


During my tests I found some sideeffects which needs some thoughts:

1) The cache
The cache uses the filename as key - now if I try to resolve a file named
webdav:/anydir/test%0d.txt
the webdav will return a file named
webdav:/anydir/test\r.txt (\r = the unencode %0d)

As you might see, both filenames are different and thus it will create two different entries in the cache (which is not acceptable).

If i ask wedav to return the escaped form of the name it will return
webdav:/anydir/test%0D.txt (notice the uppercase "D") - again a different name.


However, what if one is funny and tries to resolve
webdav:/anydir/%74est%0d.txt
In this case the filename from the fileprovider is different - regardless if I get the normal or escaped form.


So my conclusion is to always use a decoded form of the filename for the cache key - knowing that in the very very rare cases where the decoding is not symmetric I might have a problem with the cache.


2) German "Umlauts" ... and any other non ascii character.

I cant use the encoded form of the filename from the filesystem provider as I have to know the encoding then (ISO, UTF-8). Currently the filesystem libraries are responsible for the correct decoding - and I dont want to enter a charset war - again, its best to use the decoded filename.


Result:
VFS should not introduce its own encoding, only the "%" (and "!" for the layered filesystem) needs some addressing and to allow the case where one needs to pass down a special url to the filesystem.



Comments?

---
Mario


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to