Re: file-system-api: filename restrictions
On Wed, Jan 12, 2011 at 6:37 PM, Charles Pritchard wrote: > Those very-short length limitations are really frustrating. More so for other languages, too; 255 bytes of UTF-8 is only 85 codepoints of CJK--still a fairly long filename, but not unheard of. But, it seems long enough for sandboxed access, as long as the restriction can be lifted for non-sandboxed "mountpoint" access later on. > The experiment has opened up some room > for the idea of mount points. Perhaps, further exploration there would be > helpful. I think that just returns a FileList, as if the user had manually selected each file in the directory, which is a fairly lightweight feature. A "mountpoint" interface would presumably return a FileSystem object (among other details, like adding FileSystem to the structured clone algorithm, and a way to request a FileSystemSync object from a FileSystem object). -- Glenn Maynard
Re: file-system-api: filename restrictions
On 1/11/2011 2:33 PM, Eric Uhrhane wrote: Actually, it's not just that Linux users now have to worry about Windows rules; Windows users also have to worry about Linux rules, in particular the path length limitation, which is 255 bytes on Linux but 255 UTF-16 code points on Windows. WebKit is addressing the current issue in active releases (a 255 byte path length limit): http://code.google.com/p/chromium/issues/detail?id=63574 Those very-short length limitations are really frustrating. It's also a pain for backing up files, eg. copying "moon rock?.jpg" to "moon rock?.jpg~", and for "safe writes", writing to "moon rock?.jpg.new" and then renaming the finished file over the original. I'm looking into the encoding problems now, and will respond later. In general we should be able to read any such file already, at the very least by enumerating the directory to get the FileEntry, but creating files with valid names may be tricky. A lot of these use cases really would be better served by IndexedDB. No issues with naming, no issues with file counts/inodes, performance, etc. Blobs can be stashed nicely, and so forth. However, an obvious expansion of this API which we've talked a lot about is the ability to expose other "mount points" to the browser. For example, a trusted app might be granted access to "My Photos" or another similar directory. There the majority of the files are expected to be created by apps outside of the browser, and you run into the thumbnail problem you describe above, where a read-modify-write of a path or even a copy operation can inadvertently create a file path that's banned by the API, but is legal on the host system. The experiment has opened up some room for the idea of mount points. Perhaps, further exploration there would be helpful.
Re: file-system-api: filename restrictions
On 1/11/11 8:02 PM, Glenn Maynard wrote: The infamous Turkish "I" comes to mind as a portability problem: code could reasonably create "Info" in one place and read it as "info" in another, which would be different files in a Turkish locale. Windows still treats "I" and "i" as the same letter even in Turkish codepages, but a Linux glibc-based implementation wouldn't. For what it's worth. HTML5 typically uses "ASCII case-insensitivity" to avoid this problem. But they're working with strings that are mostly ASCII, which may not be the case for filenames -Boris
Re: file-system-api: filename restrictions
On Tue, Jan 11, 2011 at 5:33 PM, Eric Uhrhane wrote: > 2) Developers often don't read UA logs. We should fail early on the > dev box, rather than failing later on the user's machine. (I guess I just lack sympathy for developers who completely ignore browser warnings.) >> - existing filenames that differ only by case. Similarly, should the >> UA just ignore all but one of them and make a log to the console? > > There's no problem accessing those through directory enumeration, or > via a supplied path. You just can't create this situation using the > API. It could happen if the user's locale changes, but that's not a likely problem. The infamous Turkish "I" comes to mind as a portability problem: code could reasonably create "Info" in one place and read it as "info" in another, which would be different files in a Turkish locale. Windows still treats "I" and "i" as the same letter even in Turkish codepages, but a Linux glibc-based implementation wouldn't. Should the usual equivalences for the ASCII range be required, regardless of the user's locale? > However, an obvious expansion of this API which we've talked a lot > about is the ability to expose other "mount points" to the browser. > For example, a trusted app might be granted access to "My Photos" or > another similar directory. There the majority of the files are > expected to be created by apps outside of the browser, and you run > into the thumbnail problem you describe above, where a > read-modify-write of a path or even a copy operation can inadvertently > create a file path that's banned by the API, but is legal on the host > system. > > In a perfect world, I think we'd want all paths that came from the web > app to be LCD-safe, but all paths that came from the host machine to > be permitted. Since that's not generally detectable by the UA, or > even well-defined in all cases, perhaps we can help developers to > solve the problem manually. We could offer another API [or just a > flag in the existing APIs] that says "I'm using paths derived from the > local system. Let me try this, even if it's not LCD-safe]." I don't > think we want to allow that in the per-origin sandbox defined in the > current spec, but I could see it being quite valuable for other mount > points. If that sounds reasonable, we can put that in when we spec > this potential future API expansion. Letting the user grant access to specific directories is in my opinion the one big missing piece that will make this API very interesting. That's also when this will all become much more important, so leaving these to be addressed together seems fine. -- Glenn Maynard
Re: file-system-api: filename restrictions
Glenn: Sorry about the slow response; I was on vacation, and am only now catching up. We've discussed these issues before, see http://lists.w3.org/Archives/Public/public-device-apis/2010Jan/0229.html for much of the initial discussion. However, you've brought up a new point that I think is worth addressing. On Sun, Dec 19, 2010 at 11:26 AM, Glenn Maynard wrote: > Section 8 "Uniformity of interface" will cause headaches for some use > cases. For example, an application may want to allow the user to fill > a directory with images, then output a thumbnail of each image "x.jpg" > into a subdirectory with the same filename, "thumbs/x.jpg". > > However, we're forbidden from creating a new file with "invalid" > filenames, even if they exist elsewhere. The operation will fail, and > we'll have to tell our Linux users with images named "at the beach: > moon rock?.jpg" that they have to obey Windows filename > conventions--which will probably be upsetting. It'd also be a > difficult rule for users to follow; while it's easy in Windows since > it's globally enforced in all applications, Linux users would have to > memorize the rules themselves. Actually, it's not just that Linux users now have to worry about Windows rules; Windows users also have to worry about Linux rules, in particular the path length limitation, which is 255 bytes on Linux but 255 UTF-16 code points on Windows. > It's also a pain for backing up files, eg. copying "moon rock?.jpg" to > "moon rock?.jpg~", and for "safe writes", writing to "moon > rock?.jpg.new" and then renaming the finished file over the original. > > These seem like bigger problems than the one it's trying to solve. Is > it really insufficient for these rules to define what filenames must > be supported, that any others may not be, and to suggest a UA log if > nonportable filenames are created? (Of all filename issues, the only > one that I've ever found to be a serious real-world portability issue > is case-insensitivity.) Yes, I believe that's insufficient. We've discussed this before, and 1) We really do want a fully-portable subset to be the standard; code should work everywhere if it works anywhere. You shouldn't have to code to OSX any more than you should have to code to Opera--just code to the web platform. 2) Developers often don't read UA logs. We should fail early on the dev box, rather than failing later on the user's machine. > I guess there are other issues with reading data created outside of the API: > > - filenames that can't be decoded to a DOMString, eg. undecodable > bytes in a UTF-8 filesystem. This is common in Linux after eg. > unzipping a ZIP containing SJIS filenames. Should these simply be > ignored with a log? I'm looking into the encoding problems now, and will respond later. In general we should be able to read any such file already, at the very least by enumerating the directory to get the FileEntry, but creating files with valid names may be tricky. > - existing filenames that differ only by case. Similarly, should the > UA just ignore all but one of them and make a log to the console? There's no problem accessing those through directory enumeration, or via a supplied path. You just can't create this situation using the API. > Should "whitespace" in section 8.3 simply indicate space, U+0020? It looks like it should; my mistake. Thanks! > Windows does allow creating filenames ending with NBSP and other > Unicode whitespace characters, and it's not clear whether this should > be allowed. Other whitespace (\r, \n, \t) is covered by the control > character rule. > > Sorry if this is a rehash of past topics. The API is designed as it is to support a couple of different situations, only one of which is currently specced, but both of which have been discussed. What's specced so far is a per-origin sandbox that web apps can use for client-side storage. Depending on the UA's implementation of it, it's possible that the files stored there will be exposed to the host machine and potentially shared with apps outside of the browser, but we generally expect the browser to create most or all of them. Thus it makes sense to take a least-common-denominator [LCD] approach, so that code that works on any platform works on all platforms. If other apps create files there we should be able to access them no matter what, but things will go more smoothly if said apps respect our restrictions. However, an obvious expansion of this API which we've talked a lot about is the ability to expose other "mount points" to the browser. For example, a trusted app might be granted access to "My Photos" or another similar directory. There the majority of the files are expected to be created by apps outside of the browser, and you run into the thumbnail problem you describe above, where a read-modify-write of a path or even a copy operation can inadvertently create a file path that's banned by the API, but is legal on the host system. In a perfect worl
file-system-api: filename restrictions
Section 8 "Uniformity of interface" will cause headaches for some use cases. For example, an application may want to allow the user to fill a directory with images, then output a thumbnail of each image "x.jpg" into a subdirectory with the same filename, "thumbs/x.jpg". However, we're forbidden from creating a new file with "invalid" filenames, even if they exist elsewhere. The operation will fail, and we'll have to tell our Linux users with images named "at the beach: moon rock?.jpg" that they have to obey Windows filename conventions--which will probably be upsetting. It'd also be a difficult rule for users to follow; while it's easy in Windows since it's globally enforced in all applications, Linux users would have to memorize the rules themselves. It's also a pain for backing up files, eg. copying "moon rock?.jpg" to "moon rock?.jpg~", and for "safe writes", writing to "moon rock?.jpg.new" and then renaming the finished file over the original. These seem like bigger problems than the one it's trying to solve. Is it really insufficient for these rules to define what filenames must be supported, that any others may not be, and to suggest a UA log if nonportable filenames are created? (Of all filename issues, the only one that I've ever found to be a serious real-world portability issue is case-insensitivity.) I guess there are other issues with reading data created outside of the API: - filenames that can't be decoded to a DOMString, eg. undecodable bytes in a UTF-8 filesystem. This is common in Linux after eg. unzipping a ZIP containing SJIS filenames. Should these simply be ignored with a log? - existing filenames that differ only by case. Similarly, should the UA just ignore all but one of them and make a log to the console? Should "whitespace" in section 8.3 simply indicate space, U+0020? Windows does allow creating filenames ending with NBSP and other Unicode whitespace characters, and it's not clear whether this should be allowed. Other whitespace (\r, \n, \t) is covered by the control character rule. Sorry if this is a rehash of past topics. -- Glenn Maynard