Am 23.10.2008 um 09:25 schrieb Craig Ringer: > Hi > > Before I start work on this, I just want to check to make sure I'm not > missing anything obvious. There isn't currently any interface > exposed to > permit users to progressively read a filtered PDF stream or do random > I/O in an unfiltered stream, is there? > > I'd like to provide a PdfInputStream-like interface for PdfStream, so > that users can read huge streams in small segments. For streams that > don't have any filters applied (be they file or memory based) it could > also do random I/O. > > This is different from GetFilteredCopy(PdfOutputStream*), in that > there's no need for the caller to implement a custom > PdfOutputStream to > do whatever work they need to do, and for file streams it doesn't have > to allocate a temporary copy of the whole stream in RAM in order to > filter it. It'd also be an easier interface to use for most work, > especially where you might not even want to decode all the stream. > > The main use I have for this is in PoDoFoBrowser, where we really > shouldn't have to allocate a whole stream in memory and possibly > allocate another decompressed copy of it if it's flate filtered or > similar. The same principle will apply to other programs processing > big > PDF streams (say, huge images) though. > > I'd like to preserve the existing interfaces in PdfStream, but rewrite > GetCopy and GetFilteredCopy to use the underlying progressive reading > interfaces. PdfStream would no longer make any assumption that a > stream > has an "internal buffer" that may be accessed; instead, it'll request > data from the stream in small chunks and feed those to the output > or to > any required filter. The chunk size can be big enough that the > (minimal) > overhead of the function calls etc for the progressive reading > should be > basically undetectable, and concrete stream implementations can > override > the methods if they have a simpler way to do it anyway. > > Once I've got the PdfStream interface adjustments done it should be > possible to do something like extract and write a 100MB image from > a PDF > without using more than a few hundred kb of RAM. > > Sound good? If so, the next thing I'll want to do is write a > variant on > PdfFileStream that uses an external temp file instead of a view > into the > original PDF, so it's possible to edit a stream without having to load > the whole thing into RAM at once. Again, I'm sure you can see uses > outside the obvious ones in PoDoFoBrowser. >
Sounds good, but I wouldn't bother with file buffers in the KBs. Buffer should be at least 1M big, unless you are targeting embedded devices. Best buffers are the ones that only get filled once! I could imagine making that a compile-time parameter, with 1-16MB for regular desktop builds, 4-128 KB for embedded devices, and 16-128 MB for servers (just gut feeling for those sizes). /Andreas ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
