Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
While it is unclear in ISO 32000-1 whether such a PDF is invalid, we made it clear in 32000-2 that you can only have one copy of each page in the Pages tree. So personally, I wouldn’t waste much time on this particular file. Leonard On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"wrote: >On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote: >> > > I don't know of a good way to validate the page count. Even >> > > going through the page tree might be hard to do right without >> > > leading to an infinite loop, in addition to being slow. >> > >> > Catalog::cachePageTree goes over the tree, but i agree doing that >> > to calculate the num of pages can be meh. >> >> If the number of pages is huge, the PDF might be intentionally >> corrupted to provoke a bug in a particular PDF viewer, and other >> data structures could be subtly corrupted as well. Any scan would >> have to proceed very cautiously. >> >> If there is a minimum number of objects required for a page, and if >> the total number of objects is easy to find, could poppler >> immediately reject files with (total num objects) / (min objects per >> page) < page count? > >The document at >https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what >I'm thinking of. It has 5 objects and a single page that is listed in >the /Kids array 10 times. Duplicating the page just means adding it >to the array again and incrementing /Count. If we want this document >to work then there's really no minimum number of objects required for >a page. Otherwise, each page would require at least a /Page object. > >FWIW Adobe Reader shows an error on the document after the first >duplicated page. Other viewers show it just fine. >___ >poppler mailing list >poppler@lists.freedesktop.org >http://lists.freedesktop.org/mailman/listinfo/poppler ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
On 2015-09-17 08:57, Leonard Rosenthol wrote: While it is unclear in ISO 32000-1 whether such a PDF is invalid, we made it clear in 32000-2 that you can only have one copy of each page in the Pages tree. So personally, I wouldn’t waste much time on this particular file. Leonard OK, if it's not allowed by the spec, I have no real objection to the object count check. On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"wrote: On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote: > > I don't know of a good way to validate the page count. Even > > going through the page tree might be hard to do right without > > leading to an infinite loop, in addition to being slow. > > Catalog::cachePageTree goes over the tree, but i agree doing that > to calculate the num of pages can be meh. If the number of pages is huge, the PDF might be intentionally corrupted to provoke a bug in a particular PDF viewer, and other data structures could be subtly corrupted as well. Any scan would have to proceed very cautiously. If there is a minimum number of objects required for a page, and if the total number of objects is easy to find, could poppler immediately reject files with (total num objects) / (min objects per page) < page count? The document at https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what I'm thinking of. It has 5 objects and a single page that is listed in the /Kids array 10 times. Duplicating the page just means adding it to the array again and incrementing /Count. If we want this document to work then there's really no minimum number of objects required for a page. Otherwise, each page would require at least a /Page object. FWIW Adobe Reader shows an error on the document after the first duplicated page. Other viewers show it just fine. ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler