On May 11, 2008, at 6:01 PM, Aaron Boodman wrote:

On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak <[EMAIL PROTECTED]> wrote:
Well, that depends on how good the OS buffer cache is at prefetching. But in
general, there would be some disk access.

It seems better if the read API is just async for this case to prevent
the problem.

It can't entirely prevent the problem. If you read a big enough chunk, it will cause swapping which hits the disk just as much as file reads. Possibly more, because real file access will trigger OS prefetch heuristics for linear access.

I see what you mean for canvas, but not so much for XHR. It seems like a valid use case to want to be able to use XHR to download very large files. In that case, the thing you get back seems like it should have
an async API for reading.

Hmm? If you get the data over the network it goes into RAM. Why would you want an async API to in-memory data? Or are you suggesting XHR should be changed to spool its data to disk? I do not think that is practical to do for all requests, so this would have to be a special API mode for responses
that are expected to be too big to fit in memory.

Whether XHR spools to disk is an implementation detail, right? Right
now XHR is not practical to use for downloading large files because
the only way to access the result is as a string. Also because of
this, XHR implementations don't bother spooling to disk. But if this
API were added, then XHR implementations could be modified to start
spooling to disk if the response got large. If the caller requests
responseText, then the implementation just does the best it can to
read the whole thing into a string and reply. But if the caller uses
responseBlob (or whatever we call it) then it becomes practical to,
for example, download movie files, modify them, then re-upload them.

That sounds reasonable for very large files like movies. However, audio and image files are similar in size to the kinds of text or XML resources that are currently processed synchronously. In such cases they are likely to remain in memory.

In general it is sounding like it might be desirable to have at least two kinds of objects for representing binary data:

1) An in-memory, mutable representation with synchronous access. There should also be a copying API which is possibly copy-on-write for the backing store.

2) A possibly disk-backed representation that offers only asynchronous read (possibly in the form of representation #1).

Both representations could be used with APIs that can accept binary data. In most cases such APIs only take strings currently. The name of representation #2 may wish to tie it to being a file, since for anything already in memory you'd want representation #1. Perhaps they could be called ByteArray and File respectively. Open question: can a File be stored in a SQL database? If so, does the database store the data or a reference (such as a path or Mac OS X Alias)?

Regards,
Maciej



Reply via email to