On Tue, 26 Apr 2022 at 22:52, Michal Sudolsky <sudols...@gmail.com> wrote:
>
> You have this here too (just that seems pdfmm searches backwards only for 
> startxref):
>
> https://github.com/pdfmm/pdfmm/blob/master/src/pdfmm/base/PdfParser.cpp#L931-L932
>

Yes, correct. Pdf standard is saying:

ISO32000-1:2008, 7.5.5 File Trailer "Conforming readers should read a
PDF file from its end"

so the backward search is correct, but it's better to limit it to "startxref".

> Seems you are searching for a trailer right after xref (if I read that part 
> well).
>

Yes, correct, that was a cleaner solution: in my case it was useful to
fix some spurious warnings as the commit message says. It also
improved parsing performance.

> So is there actually some reason that for "i == 0" it is internal logic? What 
> if startxref is precisely PDF_XREF_BUF bytes before the last EOF offset 
> (m_LastEOFOffset)?
>

I didn't modify that code but I believe this was kind of a intended
safeguard since the backward search is slow. Assuming one put a big
amount of garbage also between "startxref" and "%%EOF" yes, what you
say is true. We should test if Adobe handles arbitrary amount of
garbage.

Going back to the reporter issue: I don't know how to fix it in PoDoFo
with a few lines patch, but if you don't think anything safe enough a
better fix is doing like a did in pdfmm not reading "trailer"
backward. Of course such change won't need being merged to pdfmm.

Cheers,
Francesco


_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to