Hello again, I tested a little further, and learned that the stack overflow actually happens during Exception unwinding of an exception thrown by the RecursionGuard.
So the recursion guards detects there's something wrong, throws its exception, that is repeatedly caught and rethrown in ReadXRefContents & ReadNextTrailer (for adding the line to the errors call stack), until the OS intervenes for some reason. It feels like the exception unwinding causes stack frames to be used as well, so it eventually runs out during unwinding. I played around with the s_maxRecursionDepth of the RecursionGuard, it works fine for up to 97, but with 98 or higher it starts to fail. Of course that number may be dependant on my test enviroment / case. I sadly cannot share the file I'm testing against, since it contains confidential data. But it begs the question, whether that repeated catch-throw is really necessary? Its only done to add numerous lines to the PdfError callstack, which imo is not of much value in this context. There is no contextual information of the singular calls (e.g. which object/trailer is currently parsed), so even if it works and does not crash, you only see the same two functions being called again and again. So I'm suggesting to get rid of the catch-throw blocks in the recursively called functions, hoping that this prevents the stack overflow during exception unwinding. Regards, F.E. Am Di., 30. Jan. 2024 um 10:43 Uhr schrieb F. E. <exler7...@gmail.com>: > Hello dear podofo-users, > > we currently have some issues with loading some pdf files using Podofo. > When performing the load operation, the podofo code crashes with a stack > overflow error. > > I took a closer look at the pdf file, stepping through the Podofo code. In > doing so, I found out that that this pdf file has 160 trailers / updates! > This gets problematic, because parsing occurs recursively here: > > ReadDocumentStructure() > -> ReadXRefContents() > -> ReadNextTrailer() > -> ReadXRefContents() > -> ReadNextTrailer() ... > > Parsing begins with ReadDocumentStructure. At the end of the function, the > (current) Xref table is read through ReadXRefContents. After the Xref > table, there is a trailer, which is processed at the end of the > ReadXRefContents function by calling the ReadNextTrailer function. > > In such a trailer, the /Prev attribute may be present, which refers to the > previous version (offset to an Xref table). If such a thing is found, > ReadXRefContents is called again with the new offset, and thus, > ReadNextTrailer is called again later. If this also has a /Prev, the > recursion continues until a trailer has no /Prev anymore. > > So, there are two functions calling each other repeadetly, without > recursion unwinding in between. Consequently, the stack for the function > calls is eventually exhausted, leading to a StackOverflow error (in my test > at Update 127). > > We are using Podofo 0.9.8, but I checked the 0.10.x code as well, the > parsing works essentially the same. > > So, in order to prevent stack overflows due to a large number of trailers, > the code needs to be restructured from a recursive to an iterative > approach. One solution could be to pass the /Prev offset from the trailer > back to the caller, and then using a do-while loop until the passed back > offset is zero. But I'm not familiar enough with all the details of the > parsing code. > > Is there someone who could have a look into this? > > Is there maybe another, easier approach for preventing the stack overflow? > > Greetings, > > F.E. > > >
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users