Re: File API to separate reading from files
Arun wrote: There is lots that is attractive about InputStream, and I think that it can be used in other specifications, especially when discussing Camera APIs, streaming from web apps (conferencing) etc. I also like the idea of DataHandler. When we define a byte primitive, it can be used in conjunction with the stream interface. For additional read features (fseek) this is also useful. I also appreciate that you have pointed out in a subsequent email [1] that it is possible to sidestep the issue of dealing with bytes directly. Managing bytes properly, with the right primitives, is one reason why, despite having looked at the Java I/O APIs[2], I went with something simpler. I think that we should have streams at some point, and I'm amenable to looking at them in a subsequent iteration of the File API. It's worth saying here that the appeal of streams is for *multiple use cases* for both File API and other APIs, and *not* because the Java I/O model is one we should emulate. Programmer taste and choice about coining APIs is subjective. Nikunj wrote in response: I respect your point on taste, however, I am more interested in composability than the maturity of Java I/O. Firstly, what Jonas proposed as the Alternative File API [1] uses an event model to address use cases such as progress feedback and separating reading from file objects. I expressed reservations about complexity, but saw more posts in favor of it than against it. This model has advantages that come with an event model (separate notifications like onprogress, onerror, allowing specific 'isolated' code, etc) along with a signature similarity to XHR (which developers are familiar with). My caveats about the model were mainly about understanding trade-offs. I'm reconciled to having a v1 of the File API specification based on Jonas' proposal (hopefully in good shape by the upcoming TPAC), and I believe we can iterate from there. It would be useful to see how you meet the following requirements: 1. incremental reading of a file's data The proposal [1] reuses the FileData interface, which will still support a slice(offset, length) method that returns another FileData object within stipulated byte ranges. I hope to flesh out what happens under range mathematics errors a bit more clearly (e.g. whether an exception is raised). Along with progress events, I think this use case is addressed. 2. concurrent access to file data (Note that FileRequest and FileReader are used interchangably in [1]; I personally prefer FileReader as a name). Nothing precludes multiple FileReader objects from accessing the same file, but not all implementations need fire notifications (events) concurrently. Do you have a specific use case in mind? 3. access to all file metadata without needing to read the file (Note that in FileRequest, which I think should be named FileReader, the read* methods take File objects as parameters, although the email proposal [1] says that they take FileData objects. Jonas means File objects). The answer to your question depends on what you mean by *all* file metadata. File objects (which inherit from FileData objects) expose name and mediaType properties, along with size (from FileData). But, suppose you wanted ID3 information from an MP3 file. In this case (assuming ID3v1 usage), you would *have* to read the file, and look for the 128 byte chunk beginning with TAG. This can be done in two ways: i. Using splice() and range mathematics based on the file's size to get to the end of the file and look at the last 128 bits of it as a separate FileData object (since ID3v1 puts stuff at the end). Not ideal. ii. Using read methods and working with the file format. Again, not dripping with syntactic sugar, but certainly feasible. I agree that metadata extraction could be made better, but I think that I'm happy with what the existing proposal has. I also don't see how any other proposal improves on this, even if you read into a stream buffer. I am happy with the existing metadata extraction for a v1, and believe that as we work out more audio and video issues on the platform, we can get to specific metadata issues. Can you clear up what you mean by all metadata? 4. separation of error handling from file reading In Jonas' proposal, this isn't done cleanly (for some definition of clean as separate from the reader object), but I think what *is* done is good for the majority of use cases. In Jonas' proposal, the FileReader object (named FileRequest in the email [1]) allows separate onerror handling (along with onprogress being separate, etc.). It's not done *within* a read method (unlike the existing proposal, which does this less well than Jonas' proposal), and the callback that handles the event can deal with the response. This is as separate as is done with XHR. All things being equal, I would prefer a model that, in order of priority: 1. involves
Re: File API to separate reading from files
Nikunj, The File API is everyone's favorite API for feature requests as well as programming style discussions :) interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } There is lots that is attractive about InputStream, and I think that it can be used in other specifications, especially when discussing Camera APIs, streaming from web apps (conferencing) etc. I also like the idea of DataHandler. When we define a byte primitive, it can be used in conjunction with the stream interface. For additional read features (fseek) this is also useful. I also appreciate that you have pointed out in a subsequent email [1] that it is possible to sidestep the issue of dealing with bytes directly. Managing bytes properly, with the right primitives, is one reason why, despite having looked at the Java I/O APIs[2], I went with something simpler. I think that we should have streams at some point, and I'm amenable to looking at them in a subsequent iteration of the File API. It's worth saying here that the appeal of streams is for *multiple use cases* for both File API and other APIs, and *not* because the Java I/O model is one we should emulate. Programmer taste and choice about coining APIs is subjective. For a first version (which should replace http://www.w3.org/TR/file-upload/ , with a more meaningful name like File API), I think we should address use cases around reads. Ian Fette has given us plenty of other uses cases for consideration later on[3]. While my editor's draft strove to address the use cases for file access with different asynchronous data accessors, it was clear that it couldn't gracefully account for progress events. Moreover, general feedback favored a model that used events with a separate reader object that allowed for progress events, and Jonas' alternative proposal does this as well as resembles XHR [4]. While I'm reluctant to sacrifice simplicity, I think moving in the direction of the Alternative File API[4] reconciles use cases such as progress events with calls for a reader/event model. FWIW, I disagree that resemblance to XHR should be seen as unwanted baggage [5]. I think it's desirable to resemble an API that has such widespread usage! While the web is inconsistent, event models are widely used, and similarity between XHR and File API, which will be used in conjunction anyway, is probably a good thing. I'd say the following might be next steps: 1. Work on the File API but revise the editor's draft to reflect [4]. 2. Collect use cases for further iterations of the File API, and determine which WG should carry them forward. I'm in favor of continuing in this WG as opposed to the Device API WG, but that is a separate question. 3. In conjunction with 2., discuss security and platform issues around proposals such as [6]. I'm interested in airing more concrete proposals on this listserv. -- A* [1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0757.html [2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0729.html [3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0900.html [4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html [5] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0749.html [6] http://dev.w3.org/2006/webapi/fileio/fileIO.htm
Re: File API to separate reading from files
On Aug 31, 2009, at 11:28 PM, Arun Ranganathan wrote: Nikunj, The File API is everyone's favorite API for feature requests as well as programming style discussions :) interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } There is lots that is attractive about InputStream, and I think that it can be used in other specifications, especially when discussing Camera APIs, streaming from web apps (conferencing) etc. I also like the idea of DataHandler. When we define a byte primitive, it can be used in conjunction with the stream interface. For additional read features (fseek) this is also useful. I also appreciate that you have pointed out in a subsequent email [1] that it is possible to sidestep the issue of dealing with bytes directly. Managing bytes properly, with the right primitives, is one reason why, despite having looked at the Java I/O APIs[2], I went with something simpler. I think that we should have streams at some point, and I'm amenable to looking at them in a subsequent iteration of the File API. It's worth saying here that the appeal of streams is for *multiple use cases* for both File API and other APIs, and *not* because the Java I/O model is one we should emulate. Programmer taste and choice about coining APIs is subjective. I respect your point on taste, however, I am more interested in composability than the maturity of Java I/O. It would be useful to see how you meet the following requirements: 1. incremental reading of a file's data 2. concurrent access to file data 3. access to all file metadata without needing to read the file 4. separation of error handling from file reading All things being equal, I would prefer a model that, in order of priority: 1. involves fewer steps, and 2. evolves nicely with file write and binary access, which are both likely to be next evolution directions in this area. Can you provide a comparison of your proposed approach with my proposal for the above so that the WG can develop an informed opinion about the proposals? For a first version (which should replace http://www.w3.org/TR/file-upload/ , with a more meaningful name like File API), I think we should address use cases around reads. Ian Fette has given us plenty of other uses cases for consideration later on[3]. While my editor's draft strove to address the use cases for file access with different asynchronous data accessors, it was clear that it couldn't gracefully account for progress events. Moreover, general feedback favored a model that used events with a separate reader object that allowed for progress events, and Jonas' alternative proposal does this as well as resembles XHR [4]. While I'm reluctant to sacrifice simplicity, I think moving in the direction of the Alternative File API[4] reconciles use cases such as progress events with calls for a reader/event model. FWIW, I disagree that resemblance to XHR should be seen as unwanted baggage [5]. I think it's desirable to resemble an API that has such widespread usage! This is arguable at best, since it seems to be an opinion not shared by everyone, especially not the editor of XMLHttpRequest [1]. In fact, there is no similarity to XHR in the current editor's draft, and I wonder why those benefits were considered unimportant when drafting previously. While the web is inconsistent, event models are widely used, and similarity between XHR and File API, which will be used in conjunction anyway, is probably a good thing. Can you explain in light of the objections I raised in [2], why the Alternative File API is the right approach. I haven't seen any replies to my points. I'd say the following might be next steps: 1. Work on the File API but revise the editor's draft to reflect [4]. 2. Collect use cases for further iterations of the File API, and determine which WG should carry them forward. I'm in favor of continuing in this WG as opposed to the Device API WG, but that is a separate question. 3. In conjunction with 2., discuss security and platform issues around proposals such as [6]. I'm interested in airing more concrete proposals on this listserv. -- A* [1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0757.html [2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0729.html [3] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0900.html [4] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0565.html [5] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0749.html [6] http://dev.w3.org/2006/webapi/fileio/fileIO.htm Nikunj http://o-micron.blogspot.com [1] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0537.html [2] http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/0748.html
Re: File API to separate reading from files
On Wed, Aug 19, 2009 at 11:47 AM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. I Agree. [snip] [snip example] Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Agree. Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. Would it be possible to have a reader handle creating the input stream and making the decision based on what type of Reader it is, passing byte offset lengths to the input stream -- essentially hiding those details? [snip example] Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; That's odd. // *According to my proposal* var stream = new TextInputStream(new FileInputStream(myFile), UTF-16); stream.read(handleDataAsText); // don't you need to add the onerror before read()? stream.onerror = errorHandler; function handleDataAsText(fileContent) { } function errorHandler(error) { } Note the two differences: 1. Error handling is separated from file reading Right. Method handleDataAsText does one thing only, as does the error handler. You seem to have misplaced the onerror. Shouldn't that, as commented, be assigned before - read - is called? Could read() raise an exception immediately? Why put the callback as an argument? What is wrong with having a success callback? A generic read method puts the type of reading on the stream, as you would have it. Read just sends a message: read, but does not specify the details. 2. Two extra objects are needed to read text data out of the file. However, the composability of input streams enables a far richer library to operate. I don't see why this is important. For the purpose of the goals of this specification, is it the complexity justified? I had the Reader idea and that was deemed too complex, but what I see you proposing sounds, well, flexible, but more involved. There's more busywork just to read a file. This API matches more closely the Java API for IO. That is not necessarily ideal. Design decisions a decade ago in a different language, for different contexts might not be the best decisions for this context. I feel a bit odd about giving an API critique to someone who seems to be a lot more knowledgeable and experienced. But anyway, this proposal is extensible. It does not paint itself into a corner like the other. Is it possible to simplify the interface a little bit? I'm not married to the Reader idea, but it was a simpler
Re: File API to separate reading from files
I would like to make another plug for http://dev.w3.org/2006/webapi/fileio/fileIO.htm This had the notion of writing files, file streams, directories, and being able to integrate into the host filesystem. All of these are important for reasons I outlined in http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/022388.html and subsequent replies. Quoting that email: I would much rather have a well thought-out local filesystem proposal, than continued creep of the existing File and Local Storage proposal. These proposals are both designed from the perspective of I want to take some existing data and either put it into the cloud or make it available offline. They don't really handle the use case of I want to create new data and save it to the local filesystem, or I want to modify existing data on the filesystem, or I want to maintain a virtual filesystem for my application, and potentially map in the existing filesystem (e.g. if I'm flickr and I want to be able to read the user's My Photos folder, send those up, but also make thumbnails that I want to save locally and don't care if they get uploaded, maintain an index file with image metadata / thumbnails / locally, save off some intermediate files, ... For this, I would really like to see us take another look at http://dev.w3.org/2006/webapi/fileio/fileIO.htm (I don't think this spec is exactly what we need, but I like the general approach of origins get a virtual filesystem tucked away that they can use, they can fread/fwrite/fseek, and optionally if they want to interact with the host FS they can request that and then get some sub-set of that (e.g. my documents or my photos) mapped in.. 2009/8/31 Garrett Smith dhtmlkitc...@gmail.com On Wed, Aug 19, 2009 at 11:47 AM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. I Agree. [snip] [snip example] Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Agree. Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. Would it be possible to have a reader handle creating the input stream and making the decision based on what type of Reader it is, passing byte offset lengths to the input stream -- essentially hiding those details? [snip example] Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; That's odd. // *According to my proposal*
Re: File API to separate reading from files
Ian Fette (イアンフェッティ) wrote: I would like to make another plug for http://dev.w3.org/2006/webapi/fileio/fileIO.htm This had the notion of writing files, file streams, directories, and being able to integrate into the host filesystem. All of these are important for reasons I outlined in http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-August/022388.html and subsequent replies. Ian: Nothing in my draft precludes those from coming later, in a v2 (and subsequent iterations), modulo good proposals about platform differences and security issues. Writing out data to the filesystem is a very different set of issues, with different security issues. There are numerous issues that need resolution here. It turns out the File API is a controversial one, even to get right basic read features, and I'm keen to finish a v1 draft soon. It's clear that even within the spectrum of read, we may not quite satisfy all use cases (such as fseek) in v1, but I'd like something that we can iterate on later. *Even* addressing the use cases of reading data, we've encountered discussions about programming style (Java I/O vs. a model closer to XHR), progress events, etc. I'd like to resolve these in a v1 draft first. I think v1 should address the most common use cases, which really do appear to be: I want to take some existing data and either put it into the cloud or make it available offline. Right, because we can't even do this elegantly on the web today! Developers use Firefox's synchronous API for File access, or Gears, or Flash (at least to get data from the file system into web apps and then the cloud). Furthermore: They don't really handle the use case of I want to create new data and save it to the local filesystem, or I want to modify existing data on the filesystem, or I want to maintain a virtual filesystem for my application, and potentially map in the existing filesystem Not *yet* but I think that these features can be evolved over the course of time. The proposal you cite: http://dev.w3.org/2006/webapi/fileio/fileIO.htm doesn't adequately address security issues, and deals with use cases that I'm not so sure are critical for a first version. I appreciate that Google wants these next, and so I'd like to see proposals that address the open issues, some of which have been mentioned on the whatwg thread you posted a link to. I'm on vacation Sept. 1 - Sept. 4, so I'll respond to other email about this topic upon my return. -- A*
File API to separate reading from files
Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. Firstly, there is a simple interface to access file metadata. This metadata is always accessed synchronously. A file object could be passed to XHR, in which case it can upload the file during the send() process. interface File { readonly attribute DOMString name; readonly attribute DOMString mediaType; readonly atribute DOMString url; readonly attribute unsigned long long size; } Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } Fourthly, reading a block of bytes is supported through an interface that accepts an array of integers. This is similar to the Gears Blob interface. [CallBack=FunctionOnly] interface DataHandler { handle(in sequenceint data); } Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; // *According to editor's draft* myFile.getAsText(handleDataAsText) function handleDataAsText(fileContent, error) { if (error) { } } // *According to my proposal* var stream = new TextInputStream(new FileInputStream(myFile), UTF-16); stream.read(handleDataAsText); stream.onerror = errorHandler; function handleDataAsText(fileContent) { } function errorHandler(error) { } Note the two differences: 1. Error handling is separated from file reading 2. Two extra objects are needed to read text data out of the file. However, the composability of input streams enables a far richer library to operate. This API matches more closely the Java API for IO. It also benefits from the extensibility model used in Java, while retaining the asynchronous processing nature that is preferred in ECMAScript environments. It is also not too different from the editor's draft in that it does not introduce a completely different kind of data processing - we are still looking at callbacks. However, the improvement is in the composability of streams as well as supporting multiple concurrent file readers and processing blocks of data at a time. Progress events can be built on top but I welcome suggestions to build them in to this proposal. Nikunj http://o-micron.blogspot.com
Re: File API to separate reading from files
On Wed, Aug 19, 2009 at 11:47 AM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. Firstly, there is a simple interface to access file metadata. This metadata is always accessed synchronously. A file object could be passed to XHR, in which case it can upload the file during the send() process. interface File { readonly attribute DOMString name; readonly attribute DOMString mediaType; readonly atribute DOMString url; readonly attribute unsigned long long size; } Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } Fourthly, reading a block of bytes is supported through an interface that accepts an array of integers. This is similar to the Gears Blob interface. [CallBack=FunctionOnly] interface DataHandler { handle(in sequenceint data); } Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; // *According to editor's draft* myFile.getAsText(handleDataAsText) function handleDataAsText(fileContent, error) { if (error) { } } // *According to my proposal* var stream = new TextInputStream(new FileInputStream(myFile), UTF-16); stream.read(handleDataAsText); stream.onerror = errorHandler; function handleDataAsText(fileContent) { } function errorHandler(error) { } Note the two differences: 1. Error handling is separated from file reading 2. Two extra objects are needed to read text data out of the file. However, the composability of input streams enables a far richer library to operate. This API matches more closely the Java API for IO. It also benefits from the extensibility model used in Java, while retaining the asynchronous processing nature that is preferred in ECMAScript environments. It is also not too different from the editor's draft in that it does not introduce a completely different kind of data processing - we are still looking at callbacks. However, the improvement is in the composability of streams as well as supporting multiple concurrent file readers and processing blocks of data at a time. Progress events can be built on top but I welcome suggestions to build them in to this proposal. Nikunj http://o-micron.blogspot.com A few comments on this: I do like the idea
Re: File API to separate reading from files
On Aug 19, 2009, at 12:21 PM, Jonas Sicking wrote: On Wed, Aug 19, 2009 at 11:47 AM, Nikunj R. Mehtanikunj.me...@oracle.com wrote: Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. Firstly, there is a simple interface to access file metadata. This metadata is always accessed synchronously. A file object could be passed to XHR, in which case it can upload the file during the send() process. interface File { readonly attribute DOMString name; readonly attribute DOMString mediaType; readonly atribute DOMString url; readonly attribute unsigned long long size; } Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } Fourthly, reading a block of bytes is supported through an interface that accepts an array of integers. This is similar to the Gears Blob interface. [CallBack=FunctionOnly] interface DataHandler { handle(in sequenceint data); } Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; // *According to editor's draft* myFile.getAsText(handleDataAsText) function handleDataAsText(fileContent, error) { if (error) { } } // *According to my proposal* var stream = new TextInputStream(new FileInputStream(myFile), UTF-16); stream.read(handleDataAsText); stream.onerror = errorHandler; function handleDataAsText(fileContent) { } function errorHandler(error) { } Note the two differences: 1. Error handling is separated from file reading 2. Two extra objects are needed to read text data out of the file. However, the composability of input streams enables a far richer library to operate. This API matches more closely the Java API for IO. It also benefits from the extensibility model used in Java, while retaining the asynchronous processing nature that is preferred in ECMAScript environments. It is also not too different from the editor's draft in that it does not introduce a completely different kind of data processing - we are still looking at callbacks. However, the improvement is in the composability of streams as well as supporting multiple concurrent file readers and processing blocks of data at a time. Progress events can be built on top but I welcome suggestions to build them in to this proposal. Nikunj http://o-micron.blogspot.com A few
Re: File API to separate reading from files
On Wed, 19 Aug 2009 21:21:54 +0200, Jonas Sicking jo...@sicking.cc wrote: I do like the idea of having a stream primitive. I think we'll need that for other things in the future such as reading data from a camera, or reading data from a microphone. I quite concur, and for the widget space, actually writing data is a real use-case. However I'm not convinced that we should force people to use streams to deal with the simple use case of reading data from a file. In 95% (if not more) of the cases the user simply wants to get the contents of the file, and so forcing them to do that using: Does the presence of an IOStream primitive exclude getAs*? -- Arve Bersvendsen Opera Software ASA, http://www.opera.com/
Re: File API to separate reading from files
On Aug 19, 2009, at 3:07 PM, Arve Bersvendsen wrote: On Wed, 19 Aug 2009 21:21:54 +0200, Jonas Sicking jo...@sicking.cc wrote: I do like the idea of having a stream primitive. I think we'll need that for other things in the future such as reading data from a camera, or reading data from a microphone. I quite concur, and for the widget space, actually writing data is a real use-case. However I'm not convinced that we should force people to use streams to deal with the simple use case of reading data from a file. In 95% (if not more) of the cases the user simply wants to get the contents of the file, and so forcing them to do that using: Does the presence of an IOStream primitive exclude getAs*? They are independent proposals. Of course, it is still possible that the next WD combine the two. I wouldn't like that too much, but that's just me. -- Arve Bersvendsen Opera Software ASA, http://www.opera.com/ Nikunj http://o-micron.blogspot.com
Re: File API to separate reading from files
I want to clarify that it is indeed possible to adjust my proposal to sidestep the issue of dealing with bytes directly. Here's how. Remove the read() method in the InputStream interface. Remove the DataHandler interface Notice that composability doesn't depend on the availability of raw bytes to the JavaScript interface. In fact, the implementation should be able to dissect the bytes corresponding to the input stream being passed so that it can either: 1. produce a data URL 2. produce text blocks 3. produce binary string blocks I hope that eliminates any expectation of a dependency on representing bytes as integers from my proposal. I just threw it in since I felt Gears thought it was OK. Perhaps, a real byte handler can be added when ECMAScript is ready with byte arrays and the File I/O will serve as a rallying cause for it. Nikunj On Aug 19, 2009, at 11:47 AM, Nikunj R. Mehta wrote: Here's an alternative, more easily extensible, proposal for reading files. It provides applications a way to read small amounts of data at a time. It also allows applications to concurrently read the same file. Firstly, there is a simple interface to access file metadata. This metadata is always accessed synchronously. A file object could be passed to XHR, in which case it can upload the file during the send() process. interface File { readonly attribute DOMString name; readonly attribute DOMString mediaType; readonly atribute DOMString url; readonly attribute unsigned long long size; } Secondly, a list of files can be obtained using some UI. typedef sequenceFile FileList; Thirdly, an abstract interface is an input stream that is not limited to files. It works at the level of bytes that files are made of. The read() operation can specify the extent that is required. If an application wishes to read small increments, it can thus specify those increments. Of course, the File interface identifies its size, so the application can suitably choose increments. Processing of blocks read from the file occurs in callbacks. XHR could also consider taking an InputStream parameter during the send() operation. interface InputStream { read(in DataHandler, [optional in] long long offset, [optional in] long long length); abort(); attribute Function onerror; } Fourthly, reading a block of bytes is supported through an interface that accepts an array of integers. This is similar to the Gears Blob interface. [CallBack=FunctionOnly] interface DataHandler { handle(in sequenceint data); } Fifthly, a file can be used for reading an input stream by specifying the name of a file when constructing the stream [Constructor(in File toOpen)] interface FileInputStream : InputStream { } Sixthly, one can create various kinds of derived readers such as text reader, binary string reader, and data URL reader. By inheriting from InputStream, the basic mechanisms such as abort and onerror are inherited. Moreover, the base read behavior is altered by the subclass although it behaves in a similar manner, except that the data seen outside is different. [Constructor(in InputStream base)] interface BinaryStringInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } The callback is provided a DOMString. The String's length is expected to match the increment requested. [CallBack=FunctionOnly] interface StringDataHandler { handle(in DOMString data); } For text reading, encoding is optionally specified. [Constructor(in InputStream base, [optional in] DOMString encoding)] interface TextInputStream : InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } A file can be alternatively read as a dataURL using a similar kind of handler as above. [Constructor(in InputStream base)] interface FileDataURL: InputStream { read(in StringDataHandler, [optional in] long long offset, [optional in] long long length); } This API has the advantage that it can cleanly be extended to deal with both writing use cases and binary data. Furthermore, it can also support extensions that perform cryptographic, compression, or coding on top of the basic interfaces. To compare with the editor's draft, here's a typical programming case in JavaScript: var fileList = ... // There is a mistake in the example provided in Section 3 where it does fileList.files[0] var myFile = fileList[0]; // *According to editor's draft* myFile.getAsText(handleDataAsText) function handleDataAsText(fileContent, error) { if (error) { } } // *According to my proposal* var stream = new TextInputStream(new FileInputStream(myFile), UTF-16); stream.read(handleDataAsText); stream.onerror = errorHandler; function handleDataAsText(fileContent) { } function errorHandler(error) { } Note the two differences: 1. Error handling is separated from file