On Tue, 26 Apr 2022 at 22:52, Michal Sudolsky <sudols...@gmail.com> wrote: > > You have this here too (just that seems pdfmm searches backwards only for > startxref): > > https://github.com/pdfmm/pdfmm/blob/master/src/pdfmm/base/PdfParser.cpp#L931-L932 >
Yes, correct. Pdf standard is saying: ISO32000-1:2008, 7.5.5 File Trailer "Conforming readers should read a PDF file from its end" so the backward search is correct, but it's better to limit it to "startxref". > Seems you are searching for a trailer right after xref (if I read that part > well). > Yes, correct, that was a cleaner solution: in my case it was useful to fix some spurious warnings as the commit message says. It also improved parsing performance. > So is there actually some reason that for "i == 0" it is internal logic? What > if startxref is precisely PDF_XREF_BUF bytes before the last EOF offset > (m_LastEOFOffset)? > I didn't modify that code but I believe this was kind of a intended safeguard since the backward search is slow. Assuming one put a big amount of garbage also between "startxref" and "%%EOF" yes, what you say is true. We should test if Adobe handles arbitrary amount of garbage. Going back to the reporter issue: I don't know how to fix it in PoDoFo with a few lines patch, but if you don't think anything safe enough a better fix is doing like a did in pdfmm not reading "trailer" backward. Of course such change won't need being merged to pdfmm. Cheers, Francesco _______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users