Hi

Before I start work on this, I just want to check to make sure I'm not 
missing anything obvious. There isn't currently any interface exposed to 
permit users to progressively read a filtered PDF stream or do random 
I/O in an unfiltered stream, is there?

I'd like to provide a PdfInputStream-like interface for PdfStream, so 
that users can read huge streams in small segments. For streams that 
don't have any filters applied (be they file or memory based) it could 
also do random I/O.

This is different from GetFilteredCopy(PdfOutputStream*), in that 
there's no need for the caller to implement a custom PdfOutputStream to 
do whatever work they need to do, and for file streams it doesn't have 
to allocate a temporary copy of the whole stream in RAM in order to 
filter it. It'd also be an easier interface to use for most work, 
especially where you might not even want to decode all the stream.

The main use I have for this is in PoDoFoBrowser, where we really 
shouldn't have to allocate a whole stream in memory and possibly 
allocate another decompressed copy of it if it's flate filtered or 
similar. The same principle will apply to other programs processing big 
PDF streams (say, huge images) though.

I'd like to preserve the existing interfaces in PdfStream, but rewrite 
GetCopy and GetFilteredCopy to use the underlying progressive reading 
interfaces. PdfStream would no longer make any assumption that a stream 
has an "internal buffer" that may be accessed; instead, it'll request 
data from the stream in small chunks and feed those to the output or to 
any required filter. The chunk size can be big enough that the (minimal) 
overhead of the function calls etc for the progressive reading should be 
basically undetectable, and concrete stream implementations can override 
the methods if they have a simpler way to do it anyway.

Once I've got the PdfStream interface adjustments done it should be 
possible to do something like extract and write a 100MB image from a PDF 
without using more than a few hundred kb of RAM.

Sound good? If so, the next thing I'll want to do is write a variant on 
PdfFileStream that uses an external temp file instead of a view into the 
original PDF, so it's possible to edit a stream without having to load 
the whole thing into RAM at once. Again, I'm sure you can see uses 
outside the obvious ones in PoDoFoBrowser.

--
Craig Ringer

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to