On May 11, 2008, at 6:01 PM, Aaron Boodman wrote:
On Sun, May 11, 2008 at 5:46 PM, Maciej Stachowiak <[EMAIL PROTECTED]>
wrote:
Well, that depends on how good the OS buffer cache is at
prefetching. But in
general, there would be some disk access.
It seems better if the read API is just async for this case to prevent
the problem.
It can't entirely prevent the problem. If you read a big enough chunk,
it will cause swapping which hits the disk just as much as file reads.
Possibly more, because real file access will trigger OS prefetch
heuristics for linear access.
I see what you mean for canvas, but not so much for XHR. It seems
like
a valid use case to want to be able to use XHR to download very
large
files. In that case, the thing you get back seems like it should
have
an async API for reading.
Hmm? If you get the data over the network it goes into RAM. Why
would you
want an async API to in-memory data? Or are you suggesting XHR
should be
changed to spool its data to disk? I do not think that is practical
to do
for all requests, so this would have to be a special API mode for
responses
that are expected to be too big to fit in memory.
Whether XHR spools to disk is an implementation detail, right? Right
now XHR is not practical to use for downloading large files because
the only way to access the result is as a string. Also because of
this, XHR implementations don't bother spooling to disk. But if this
API were added, then XHR implementations could be modified to start
spooling to disk if the response got large. If the caller requests
responseText, then the implementation just does the best it can to
read the whole thing into a string and reply. But if the caller uses
responseBlob (or whatever we call it) then it becomes practical to,
for example, download movie files, modify them, then re-upload them.
That sounds reasonable for very large files like movies. However,
audio and image files are similar in size to the kinds of text or XML
resources that are currently processed synchronously. In such cases
they are likely to remain in memory.
In general it is sounding like it might be desirable to have at least
two kinds of objects for representing binary data:
1) An in-memory, mutable representation with synchronous access. There
should also be a copying API which is possibly copy-on-write for the
backing store.
2) A possibly disk-backed representation that offers only asynchronous
read (possibly in the form of representation #1).
Both representations could be used with APIs that can accept binary
data. In most cases such APIs only take strings currently. The name of
representation #2 may wish to tie it to being a file, since for
anything already in memory you'd want representation #1. Perhaps they
could be called ByteArray and File respectively. Open question: can a
File be stored in a SQL database? If so, does the database store the
data or a reference (such as a path or Mac OS X Alias)?
Regards,
Maciej