Re: file-system-api: filename restrictions

2011-01-12 Thread Glenn Maynard
On Wed, Jan 12, 2011 at 6:37 PM, Charles Pritchard  wrote:
> Those very-short length limitations are really frustrating.

More so for other languages, too; 255 bytes of UTF-8 is only 85
codepoints of CJK--still a fairly long filename, but not unheard of.
But, it seems long enough for sandboxed access, as long as the
restriction can be lifted for non-sandboxed "mountpoint" access later
on.

> The  experiment has opened up some room
> for the idea of mount points. Perhaps, further exploration there would be
> helpful.

I think that just returns a FileList, as if the user had manually
selected each file in the directory, which is a fairly lightweight
feature.  A "mountpoint" interface would presumably return a
FileSystem object (among other details, like adding FileSystem to the
structured clone algorithm, and a way to request a FileSystemSync
object from a FileSystem object).

-- 
Glenn Maynard



Re: file-system-api: filename restrictions

2011-01-12 Thread Charles Pritchard

On 1/11/2011 2:33 PM, Eric Uhrhane wrote:

Actually, it's not just that Linux users now have to worry about
Windows rules; Windows users also have to worry about Linux rules, in
particular the path length limitation, which is 255 bytes on Linux but
255 UTF-16 code points on Windows.


WebKit is addressing the current issue in active releases (a 255 byte 
path length limit):

http://code.google.com/p/chromium/issues/detail?id=63574

Those very-short length limitations are really frustrating.



It's also a pain for backing up files, eg. copying "moon rock?.jpg" to
"moon rock?.jpg~", and for "safe writes", writing to "moon
rock?.jpg.new" and then renaming the finished file over the original.


I'm looking into the encoding problems now, and will respond later.
In general we should be able to read any such file already, at the
very least by enumerating the directory to get the FileEntry, but
creating files with valid names may be tricky.

A lot of these use cases really would be better served by IndexedDB.
No issues with naming, no issues with file counts/inodes, performance, etc.

Blobs can be stashed nicely, and so forth.


However, an obvious expansion of this API which we've talked a lot
about is the ability to expose other "mount points" to the browser.
For example, a trusted app might be granted access to "My Photos" or
another similar directory.  There the majority of the files are
expected to be created by apps outside of the browser, and you run
into the thumbnail problem you describe above, where a
read-modify-write of a path or even a copy operation can inadvertently
create a file path that's banned by the API, but is legal on the host
system.


The  experiment has opened up some room
for the idea of mount points. Perhaps, further exploration there would 
be helpful.




Re: file-system-api: filename restrictions

2011-01-11 Thread Boris Zbarsky

On 1/11/11 8:02 PM, Glenn Maynard wrote:

The infamous Turkish "I" comes to mind as a portability problem: code
could reasonably create "Info" in one place and read it as "info" in
another, which would be different files in a Turkish locale.  Windows
still treats "I" and "i" as the same letter even in Turkish codepages,
but a Linux glibc-based implementation wouldn't.


For what it's worth. HTML5 typically uses "ASCII case-insensitivity" to 
avoid this problem.  But they're working with strings that are mostly 
ASCII, which may not be the case for filenames


-Boris



Re: file-system-api: filename restrictions

2011-01-11 Thread Glenn Maynard
On Tue, Jan 11, 2011 at 5:33 PM, Eric Uhrhane  wrote:
> 2) Developers often don't read UA logs.  We should fail early on the
> dev box, rather than failing later on the user's machine.

(I guess I just lack sympathy for developers who completely ignore
browser warnings.)

>> - existing filenames that differ only by case.  Similarly, should the
>> UA just ignore all but one of them and make a log to the console?
>
> There's no problem accessing those through directory enumeration, or
> via a supplied path.  You just can't create this situation using the
> API.

It could happen if the user's locale changes, but that's not a likely problem.

The infamous Turkish "I" comes to mind as a portability problem: code
could reasonably create "Info" in one place and read it as "info" in
another, which would be different files in a Turkish locale.  Windows
still treats "I" and "i" as the same letter even in Turkish codepages,
but a Linux glibc-based implementation wouldn't.

Should the usual equivalences for the ASCII range be required,
regardless of the user's locale?

> However, an obvious expansion of this API which we've talked a lot
> about is the ability to expose other "mount points" to the browser.
> For example, a trusted app might be granted access to "My Photos" or
> another similar directory.  There the majority of the files are
> expected to be created by apps outside of the browser, and you run
> into the thumbnail problem you describe above, where a
> read-modify-write of a path or even a copy operation can inadvertently
> create a file path that's banned by the API, but is legal on the host
> system.
>
> In a perfect world, I think we'd want all paths that came from the web
> app to be LCD-safe, but all paths that came from the host machine to
> be permitted.  Since that's not generally detectable by the UA, or
> even well-defined in all cases, perhaps we can help developers to
> solve the problem manually.  We could offer another API [or just a
> flag in the existing APIs] that says "I'm using paths derived from the
> local system.  Let me try this, even if it's not LCD-safe]."  I don't
> think we want to allow that in the per-origin sandbox defined in the
> current spec, but I could see it being quite valuable for other mount
> points.  If that sounds reasonable, we can put that in when we spec
> this potential future API expansion.

Letting the user grant access to specific directories is in my opinion
the one big missing piece that will make this API very interesting.
That's also when this will all become much more important, so leaving
these to be addressed together seems fine.

-- 
Glenn Maynard



Re: file-system-api: filename restrictions

2011-01-11 Thread Eric Uhrhane
Glenn:

Sorry about the slow response; I was on vacation, and am only now catching up.

We've discussed these issues before, see
http://lists.w3.org/Archives/Public/public-device-apis/2010Jan/0229.html
for much of the initial discussion.  However, you've brought up a new
point that I think is worth addressing.

On Sun, Dec 19, 2010 at 11:26 AM, Glenn Maynard  wrote:
> Section 8 "Uniformity of interface" will cause headaches for some use
> cases.  For example, an application may want to allow the user to fill
> a directory with images, then output a thumbnail of each image "x.jpg"
> into a subdirectory with the same filename, "thumbs/x.jpg".
>
> However, we're forbidden from creating a new file with "invalid"
> filenames, even if they exist elsewhere.  The operation will fail, and
> we'll have to tell our Linux users with images named "at the beach:
> moon rock?.jpg" that they have to obey Windows filename
> conventions--which will probably be upsetting.  It'd also be a
> difficult rule for users to follow; while it's easy in Windows since
> it's globally enforced in all applications, Linux users would have to
> memorize the rules themselves.

Actually, it's not just that Linux users now have to worry about
Windows rules; Windows users also have to worry about Linux rules, in
particular the path length limitation, which is 255 bytes on Linux but
255 UTF-16 code points on Windows.

> It's also a pain for backing up files, eg. copying "moon rock?.jpg" to
> "moon rock?.jpg~", and for "safe writes", writing to "moon
> rock?.jpg.new" and then renaming the finished file over the original.
>
> These seem like bigger problems than the one it's trying to solve.  Is
> it really insufficient for these rules to define what filenames must
> be supported, that any others may not be, and to suggest a UA log if
> nonportable filenames are created?  (Of all filename issues, the only
> one that I've ever found to be a serious real-world portability issue
> is case-insensitivity.)

Yes, I believe that's insufficient.  We've discussed this before, and
1) We really do want a fully-portable subset to be the standard; code
should work everywhere if it works anywhere.  You shouldn't have to
code to OSX any more than you should have to code to Opera--just code
to the web platform.
2) Developers often don't read UA logs.  We should fail early on the
dev box, rather than failing later on the user's machine.

> I guess there are other issues with reading data created outside of the API:
>
> - filenames that can't be decoded to a DOMString, eg. undecodable
> bytes in a UTF-8 filesystem.  This is common in Linux after eg.
> unzipping a ZIP containing SJIS filenames.  Should these simply be
> ignored with a log?

I'm looking into the encoding problems now, and will respond later.
In general we should be able to read any such file already, at the
very least by enumerating the directory to get the FileEntry, but
creating files with valid names may be tricky.

> - existing filenames that differ only by case.  Similarly, should the
> UA just ignore all but one of them and make a log to the console?

There's no problem accessing those through directory enumeration, or
via a supplied path.  You just can't create this situation using the
API.

> Should "whitespace" in section 8.3 simply indicate space, U+0020?

It looks like it should; my mistake.  Thanks!

> Windows does allow creating filenames ending with NBSP and other
> Unicode whitespace characters, and it's not clear whether this should
> be allowed.  Other whitespace (\r, \n, \t) is covered by the control
> character rule.
>
> Sorry if this is a rehash of past topics.

The API is designed as it is to support a couple of different
situations, only one of which is currently specced, but both of which
have been discussed.  What's specced so far is a per-origin sandbox
that web apps can use for client-side storage.  Depending on the UA's
implementation of it, it's possible that the files stored there will
be exposed to the host machine and potentially shared with apps
outside of the browser, but we generally expect the browser to create
most or all of them.  Thus it makes sense to take a
least-common-denominator [LCD] approach, so that code that works on
any platform works on all platforms.  If other apps create files there
we should be able to access them no matter what, but things will go
more smoothly if said apps respect our restrictions.

However, an obvious expansion of this API which we've talked a lot
about is the ability to expose other "mount points" to the browser.
For example, a trusted app might be granted access to "My Photos" or
another similar directory.  There the majority of the files are
expected to be created by apps outside of the browser, and you run
into the thumbnail problem you describe above, where a
read-modify-write of a path or even a copy operation can inadvertently
create a file path that's banned by the API, but is legal on the host
system.

In a perfect worl

file-system-api: filename restrictions

2010-12-19 Thread Glenn Maynard
Section 8 "Uniformity of interface" will cause headaches for some use
cases.  For example, an application may want to allow the user to fill
a directory with images, then output a thumbnail of each image "x.jpg"
into a subdirectory with the same filename, "thumbs/x.jpg".

However, we're forbidden from creating a new file with "invalid"
filenames, even if they exist elsewhere.  The operation will fail, and
we'll have to tell our Linux users with images named "at the beach:
moon rock?.jpg" that they have to obey Windows filename
conventions--which will probably be upsetting.  It'd also be a
difficult rule for users to follow; while it's easy in Windows since
it's globally enforced in all applications, Linux users would have to
memorize the rules themselves.

It's also a pain for backing up files, eg. copying "moon rock?.jpg" to
"moon rock?.jpg~", and for "safe writes", writing to "moon
rock?.jpg.new" and then renaming the finished file over the original.

These seem like bigger problems than the one it's trying to solve.  Is
it really insufficient for these rules to define what filenames must
be supported, that any others may not be, and to suggest a UA log if
nonportable filenames are created?  (Of all filename issues, the only
one that I've ever found to be a serious real-world portability issue
is case-insensitivity.)

I guess there are other issues with reading data created outside of the API:

- filenames that can't be decoded to a DOMString, eg. undecodable
bytes in a UTF-8 filesystem.  This is common in Linux after eg.
unzipping a ZIP containing SJIS filenames.  Should these simply be
ignored with a log?
- existing filenames that differ only by case.  Similarly, should the
UA just ignore all but one of them and make a log to the console?

Should "whitespace" in section 8.3 simply indicate space, U+0020?
Windows does allow creating filenames ending with NBSP and other
Unicode whitespace characters, and it's not clear whether this should
be allowed.  Other whitespace (\r, \n, \t) is covered by the control
character rule.

Sorry if this is a rehash of past topics.

-- 
Glenn Maynard