Re: [Podofo-users] Loading pdfs with large number of trailers / updates causes stack overflow

F. E. Tue, 30 Jan 2024 03:39:39 -0800

Hello again,

I tested a little further, and learned that the stack overflow actually
happens during Exception unwinding of an exception thrown by the
RecursionGuard.


So the recursion guards detects there's something wrong, throws its
exception, that is repeatedly caught and rethrown in ReadXRefContents &
ReadNextTrailer (for adding the line to the errors call stack), until the
OS intervenes for some reason. It feels like the exception unwinding causes
stack frames to be used as well, so it eventually runs out during unwinding.

I played around with the s_maxRecursionDepth of the RecursionGuard, it
works fine for up to 97, but with 98 or higher it starts to fail. Of course
that number may be dependant on my test enviroment / case. I sadly cannot
share the file I'm testing against, since it contains confidential data.

But it begs the question, whether that repeated catch-throw is really
necessary? Its only done to add numerous lines to the PdfError callstack,
which imo is not of much value in this context. There is no contextual
information of the singular calls (e.g. which object/trailer is currently
parsed), so even if it works and does not crash, you only see the same two
functions being called again and again.

So I'm suggesting to get rid of the catch-throw blocks in the recursively
called functions, hoping that this prevents the stack overflow during
exception unwinding.

Regards,
F.E.



Am Di., 30. Jan. 2024 um 10:43 Uhr schrieb F. E. <exler7...@gmail.com>:

> Hello dear podofo-users,
>
> we currently have some issues with loading some pdf files using Podofo.
> When performing the load operation, the podofo code crashes with a stack
> overflow error.
>
> I took a closer look at the pdf file, stepping through the Podofo code. In
> doing so, I found out that that this pdf file has 160 trailers / updates!
> This gets problematic, because parsing occurs recursively here:
>
> ReadDocumentStructure()
> -> ReadXRefContents()
> -> ReadNextTrailer()
> -> ReadXRefContents()
> -> ReadNextTrailer() ...
>
> Parsing begins with ReadDocumentStructure. At the end of the function, the
> (current) Xref table is read through ReadXRefContents. After the Xref
> table, there is a trailer, which is processed at the end of the
> ReadXRefContents function by calling the ReadNextTrailer function.
>
> In such a trailer, the /Prev attribute may be present, which refers to the
> previous version (offset to an Xref table). If such a thing is found,
> ReadXRefContents is called again with the new offset, and thus,
> ReadNextTrailer is called again later. If this also has a /Prev, the
> recursion continues until a trailer has no /Prev anymore.
>
> So, there are two functions calling each other repeadetly, without
> recursion unwinding in between. Consequently, the stack for the function
> calls is eventually exhausted, leading to a StackOverflow error (in my test
> at Update 127).
>
> We are using Podofo 0.9.8, but I checked the 0.10.x code as well, the
> parsing works essentially the same.
>
> So, in order to prevent stack overflows due to a large number of trailers,
> the code needs to be restructured from a recursive to an iterative
> approach. One solution could be to pass the /Prev offset from the trailer
> back to the caller, and then using a do-while loop until the passed back
> offset is zero. But I'm not familiar enough with all the details of the
> parsing code.
>
> Is there someone who could have a look into this?
>
> Is there maybe another, easier approach for preventing the stack overflow?
>
> Greetings,
>
> F.E.
>
>
>

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Loading pdfs with large number of trailers / updates causes stack overflow

Reply via email to