Re: [Podofo-users] PDF streams - progressive and random reads

Andreas Vox Thu, 23 Oct 2008 13:38:05 -0700

Am 23.10.2008 um 09:25 schrieb Craig Ringer:

> Hi
>
> Before I start work on this, I just want to check to make sure I'm not
> missing anything obvious. There isn't currently any interface  
> exposed to
> permit users to progressively read a filtered PDF stream or do random
> I/O in an unfiltered stream, is there?
>
> I'd like to provide a PdfInputStream-like interface for PdfStream, so
> that users can read huge streams in small segments. For streams that
> don't have any filters applied (be they file or memory based) it could
> also do random I/O.
>
> This is different from GetFilteredCopy(PdfOutputStream*), in that
> there's no need for the caller to implement a custom  
> PdfOutputStream to
> do whatever work they need to do, and for file streams it doesn't have
> to allocate a temporary copy of the whole stream in RAM in order to
> filter it. It'd also be an easier interface to use for most work,
> especially where you might not even want to decode all the stream.
>
> The main use I have for this is in PoDoFoBrowser, where we really
> shouldn't have to allocate a whole stream in memory and possibly
> allocate another decompressed copy of it if it's flate filtered or
> similar. The same principle will apply to other programs processing  
> big
> PDF streams (say, huge images) though.
>
> I'd like to preserve the existing interfaces in PdfStream, but rewrite
> GetCopy and GetFilteredCopy to use the underlying progressive reading
> interfaces. PdfStream would no longer make any assumption that a  
> stream
> has an "internal buffer" that may be accessed; instead, it'll request
> data from the stream in small chunks and feed those to the output  
> or to
> any required filter. The chunk size can be big enough that the  
> (minimal)
> overhead of the function calls etc for the progressive reading  
> should be
> basically undetectable, and concrete stream implementations can  
> override
> the methods if they have a simpler way to do it anyway.
>
> Once I've got the PdfStream interface adjustments done it should be
> possible to do something like extract and write a 100MB image from  
> a PDF
> without using more than a few hundred kb of RAM.
>
> Sound good? If so, the next thing I'll want to do is write a  
> variant on
> PdfFileStream that uses an external temp file instead of a view  
> into the
> original PDF, so it's possible to edit a stream without having to load
> the whole thing into RAM at once. Again, I'm sure you can see uses
> outside the obvious ones in PoDoFoBrowser.
>


Sounds good, but I wouldn't bother with file buffers in the KBs.   
Buffer should be at least
1M big, unless you are targeting embedded devices. Best buffers are  
the ones that only
get filled once!
I could imagine making that a compile-time parameter, with 1-16MB for  
regular desktop builds,
4-128 KB for embedded devices, and 16-128 MB for servers (just gut  
feeling for those sizes).

/Andreas

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] PDF streams - progressive and random reads

Reply via email to