[ https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056686#comment-13056686 ]
Adam Nichols commented on PDFBOX-1037: -------------------------------------- If you do not use the "force" option and it does not throw an exception, then it probably parsed everything correctly, but there's no way to know for sure. PDFBOX-911 is a similar issue and Andreas and I agreed that "we need a conforming parser" to really solve the issue properly. There was another very recent thread (PDFBOX-1016) which was related to the way the xref reads in object. A PDF can have two objects with the exact same object number and revision (when there are incremental updates). Which one is actually used is dictated by the XRef tables and the thread was about how the current code does not parse the XRef tables in the correct order. I think it may resolve the issue that you are facing. The code that Thomas referenced is in the resolveConflicts() method, which is the current way of dealing with multiple objects with the same object number and revision. So, the short answer is "no, not with 100% accuracy with the current codebase, but try 1.6.0 when it comes out in a few hours and see if the patch for PDFBOX-1016 helps." > PDF with multiple %%EOF only parses one page > -------------------------------------------- > > Key: PDFBOX-1037 > URL: https://issues.apache.org/jira/browse/PDFBOX-1037 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 1.5.0 > Environment: Windows XP - Java SE 1.6 > Reporter: Abraham Farris > Attachments: blankpageproblemmod.pdf, blankpageproblemmod.png > > > Any type of page counts (getDocumentCatalog().getPages().getCount()) only > return int 1. Doing a simple .load and .save will strip out all pages after > the first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira