I've been cleaning up our OLE code and have started to look at the existing POIFS API.
http://jakarta.apache.org/poi/poifs/how-to.html#Event-Driven+Reading says this: "The event-driven API for reading documents is a little more complicated and requires that your application know, in advance, which files it wants to read. The benefit of using this API is that each document is in memory just long enough for your application to read it, and documents that you never read at all are not in memory at all. When you're finished reading the documents you wanted, the file system has no data structures associated with it at all and can be discarded." I think this is a little misleading, especially the part "documents that you never read at all are not in memory at all". Due to the nature of OLE, the table of contents stuff could very well be at the end of the file. When reading from an InputStream, this means you need to buffer the entire contents, since you can't tell what data to discard as you are reading it in. Looking at the code seems to bear out that this is actually what happens (not surprising), but it does raise a question in my mind as to how useful an event-driven API actually is. You are not actually reducing the total memory required to read in a file, just "releasing" the parts that you do not want a little quicker. The same effect could be achieved with a simple addition to the "conventional" API. In any case, the best choice for low memory situations going forward will be to stream the data to disk and use a RandomAccessFile-based reader (mmap or otherwise). So, I don't see any benefit to keeping the event-based API. Chris --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
