Re: [Podofo-users] Loading pdfs with large number of trailers / updates causes stack overflow

F. E. Wed, 31 Jan 2024 03:52:28 -0800

 @zyx <z...@gmx.us>
Increasing the s_maxRecursionDepth would defeat the purpose of the
RecursionGuard. The guard is already aligned to the typcial stack size of
256, so increasing the max would just open door to stack overflows the
guard is meant to prevent. If anything, the guards max value may actually
be to big, since it does not leave "wiggle room" for client code using
podofo.
You can increase the stack, thats true, but that would just be a bandaid
fix. As soon as an even bigger pdf file comes alone, it might crash again.
The parsing code should not use an recursive approach, thats the design
flaw, imo. The recursion guard is meant to mitigate that, but the whole
process seems to have its own flaw somewhere.


@Christopher
At least in this place, the caught exceptions do get rethrown, so there is
no new exception with every catch:

try {
        ReadNextTrailer();
    } catch( PdfError & e ) {
        if( e != ePdfError_NoTrailer )
        {
            e.AddToCallstack( __FILE__, __LINE__ );
            throw e;
        }
    }

But with every catch, an additional line is added to the internal call
stack (a string collection) of the error object. Maybe thats the issue
here, the error object getting larger with every catch, and eventually
busting the stack frame? I don't know, but thats the first thing I'm going
to try out, commenting out this AddToCallstack cakks and check if it still
crashes. If it does, removing the try catch-clauses altogether (in the
recursively called functions), and let the try-catch clasues in
ReadDocumentStructure handle it. If that STILL causes stack overflow during
excpetion unwinding for my file, I'm dead lost.

@Michal
Yes, I saw that XRefStm takes part in the recursion as well, but I left it
out for simplicity. Its also not relevant in my use case, the pdf I'm
testing with does not have xref tables as streams.
But for a proper rework of the parsing, that streams need to be handled as
well, sure.
I also thought about using a stack / queue for xref offsets to check, but
I'm just not familiar enough with the whole parsing process to hack
something like that myself ^^

Greetings,
F.E.

Am Di., 30. Jan. 2024 um 23:00 Uhr schrieb Michal Sudolsky <
sudols...@gmail.com>:

>
>> right? When you set the s_maxRecursionDepth to large-enough value, will
>> PoDoFo be able to open the file? Possibly also making the stack larger,
>> to accommodate the recursion.
>>
>>
> I suppose it will work fine. Using recursion there is not a very good
> idea. Also there are two keys that need to be followed: "XRefStm" and
> "Prev". @Francesco Pretto <cez...@gmail.com> maybe you could fix that in
> the new podofo?  Something like this could work for example (pseudocode):
>
> xrefs - vector of xrefs (or stack);
> checked - set of already checked xrefs;
> xrefs.push_back(first xref - startxref);
> while(!xrefs.empty())
> {
>   XRefType xref = xrefs.back();
>   xrefs.pop_back();
>   if(checked.contains(xref))continue; // to avoid cycles
>   checked.insert(xref);
>   ...
>   just check whether this is non-stream xref with trailer -
> xrefstream contains just single "previous xref"
>   xrefs.push_back(next "XRefStm" key from trailer);
>   xrefs.push_back(next "Prev" key from trailer);
> }
>
>
>
>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
>>
> _______________________________________________
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Loading pdfs with large number of trailers / updates causes stack overflow

Reply via email to