On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote: > > > I don't know of a good way to validate the page count. Even > > > going through the page tree might be hard to do right without > > > leading to an infinite loop, in addition to being slow. > > > > Catalog::cachePageTree goes over the tree, but i agree doing that > > to calculate the num of pages can be meh. > > If the number of pages is huge, the PDF might be intentionally > corrupted to provoke a bug in a particular PDF viewer, and other > data structures could be subtly corrupted as well. Any scan would > have to proceed very cautiously. > > If there is a minimum number of objects required for a page, and if > the total number of objects is easy to find, could poppler > immediately reject files with (total num objects) / (min objects per > page) < page count?
The document at https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what I'm thinking of. It has 5 objects and a single page that is listed in the /Kids array 10 times. Duplicating the page just means adding it to the array again and incrementing /Count. If we want this document to work then there's really no minimum number of objects required for a page. Otherwise, each page would require at least a /Page object. FWIW Adobe Reader shows an error on the document after the first duplicated page. Other viewers show it just fine. _______________________________________________ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler