Hello again,

we recently had another issue with podofo crashing while parsing a pdf. The
crashing pdf had 130 updates, but not with trailers this time, but xref
streams.
Sadly I cannot share the file, bc it contains sensitive data of a third
party.

With the previous patch, try-catch clauses
Since PdfParser::ReadXRefStreamContents had it's try-catch-throw clause not
patched with the last patch, it triggered the stack overflow when
processing the RecursionGuard exception.
I created a patch for this in a similar fashion (for the current master),
removing the try-catch-throw clause:
--- podofo/src/podofo/private/PdfParser.cpp Mon Mar  3 16:15:00 2025
+++ podofo/src/podofo/private/PdfParser.cpp Mon Mar  3 16:18:00 2025
@@ -550,24 +550,10 @@

         if (!skipFollowPrevious)
         {
-            try
-            {
-
-                // PDFs that have been through multiple PDF tools may have
a mix of xref tables (ISO 32000-1 7.5.4)
-                // and XRefStm streams (ISO 32000-1 7.5.8.1) and in the
Prev chain,
-                // so call ReadXRefContents (which deals with both)
instead of ReadXRefStreamContents
-                ReadXRefContents(device, previousOffset, false);
-            }
-            catch (PdfError& e)
-            {
-                // Be forgiving, the error happens when an entry in XRef
-                // stream points to a wrong place (offset) in the PDF file.
-                if (e != PdfErrorCode::InvalidNumber)
-                {
-                    PODOFO_PUSH_FRAME(e);
-                    throw;
-                }
-            }
+            // PDFs that have been through multiple PDF tools may have a
mix of xref tables (ISO 32000-1 7.5.4)
+            // and XRefStm streams (ISO 32000-1 7.5.8.1) and in the Prev
chain,
+            // so call ReadXRefContents (which deals with both) instead of
ReadXRefStreamContents
+            ReadXRefContents(device, previousOffset, false);
         }
     }
 }

But as you can see, doing so changes the code slightly:
Before, InvalidNumber errors where ignored, but by removing the try-catch
they get treated as every other error.
I don't know if its important to keep ignoring InvalidNumber errors, but I
know that removing the try-catch clause prevents the stack overflow (tested
that out).
So the InvalidNumber errors likely need to be handled differently, I guess.

Greetings,
F.E.


Am Mo., 12. Aug. 2024 um 00:07 Uhr schrieb Francesco Pretto <
cez...@gmail.com>:

> On Tue, 13 Feb 2024 at 14:57, F. E. <exler7...@gmail.com> wrote:
>
>> today I had some time for testing whether reducing the try-catch clauses
>> solves the stack overflow, and it does:
>>
>
> Hello, I finally had the time to look at the issue in 1.0.0-dev and I came
> up with a similar fix:
>
> https://github.com/podofo/podofo/commit/31e0139113f5860cc35d59bbe5c2844a1a37ec6c
>
>
>> Minor side-note:
>> Why does the PdfTokenizer::RecursionGuard throw a "ePdfError_InvalidXRef"
>> error when the rmax recursion depth is reached?
>> It would make more sense to throw a special "MaxRecursionReached" error
>> here.
>>
>
> Your suggestion makes a lot of sense. I will have it in time for 1.0.0.
>
> Cheers,
> Francesco
>
--- podofo/src/podofo/private/PdfParser.cpp     Mon Mar  3 16:15:00 2025
+++ podofo/src/podofo/private/PdfParser.cpp     Mon Mar  3 16:18:00 2025
@@ -550,24 +550,10 @@
 
         if (!skipFollowPrevious)
         {
-            try
-            {
-
-                // PDFs that have been through multiple PDF tools may have a 
mix of xref tables (ISO 32000-1 7.5.4) 
-                // and XRefStm streams (ISO 32000-1 7.5.8.1) and in the Prev 
chain, 
-                // so call ReadXRefContents (which deals with both) instead of 
ReadXRefStreamContents 
-                ReadXRefContents(device, previousOffset, false);
-            }
-            catch (PdfError& e)
-            {
-                // Be forgiving, the error happens when an entry in XRef
-                // stream points to a wrong place (offset) in the PDF file.
-                if (e != PdfErrorCode::InvalidNumber)
-                {
-                    PODOFO_PUSH_FRAME(e);
-                    throw;
-                }
-            }
+            // PDFs that have been through multiple PDF tools may have a mix 
of xref tables (ISO 32000-1 7.5.4) 
+            // and XRefStm streams (ISO 32000-1 7.5.8.1) and in the Prev 
chain, 
+            // so call ReadXRefContents (which deals with both) instead of 
ReadXRefStreamContents 
+            ReadXRefContents(device, previousOffset, false);
         }
     }
 }
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to