Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-17 Thread Leonard Rosenthol
While it is unclear in ISO 32000-1 whether such a PDF is invalid, we made it 
clear in 32000-2 that you can only have one copy of each page in the Pages 
tree.  So personally, I wouldn’t waste much time on this particular file.

Leonard



On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain" 
 wrote:

>On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
>> > > I don't know of a good way to validate the page count. Even
>> > > going through the page tree might be hard to do right without
>> > > leading to an infinite loop, in addition to being slow.
>> >
>> > Catalog::cachePageTree goes over the tree, but i agree doing that
>> > to calculate the num of pages can be meh.
>> 
>> If the number of pages is huge, the PDF might be intentionally
>> corrupted to provoke a bug in a particular PDF viewer, and other
>> data structures could be subtly corrupted as well. Any scan would
>> have to proceed very cautiously.
>> 
>> If there is a minimum number of objects required for a page, and if
>> the total number of objects is easy to find, could poppler
>> immediately reject files with (total num objects) / (min objects per
>> page) < page count?
>
>The document at
>https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
>I'm thinking of.  It has 5 objects and a single page that is listed in
>the /Kids array 10 times.  Duplicating the page just means adding it
>to the array again and incrementing /Count.  If we want this document
>to work then there's really no minimum number of objects required for
>a page.  Otherwise, each page would require at least a /Page object.
>
>FWIW Adobe Reader shows an error on the document after the first
>duplicated page.  Other viewers show it just fine.
>___
>poppler mailing list
>poppler@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-17 Thread Jason Crain

On 2015-09-17 08:57, Leonard Rosenthol wrote:

While it is unclear in ISO 32000-1 whether such a PDF is invalid, we
made it clear in 32000-2 that you can only have one copy of each page
in the Pages tree.  So personally, I wouldn’t waste much time on this
particular file.

Leonard


OK, if it's not allowed by the spec, I have no real objection to the 
object count check.



On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"
 wrote:


On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:

> > I don't know of a good way to validate the page count. Even
> > going through the page tree might be hard to do right without
> > leading to an infinite loop, in addition to being slow.
>
> Catalog::cachePageTree goes over the tree, but i agree doing that
> to calculate the num of pages can be meh.

If the number of pages is huge, the PDF might be intentionally
corrupted to provoke a bug in a particular PDF viewer, and other
data structures could be subtly corrupted as well. Any scan would
have to proceed very cautiously.

If there is a minimum number of objects required for a page, and if
the total number of objects is easy to find, could poppler
immediately reject files with (total num objects) / (min objects per
page) < page count?


The document at
https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
I'm thinking of.  It has 5 objects and a single page that is listed in
the /Kids array 10 times.  Duplicating the page just means adding it
to the array again and incrementing /Count.  If we want this document
to work then there's really no minimum number of objects required for
a page.  Otherwise, each page would require at least a /Page object.

FWIW Adobe Reader shows an error on the document after the first
duplicated page.  Other viewers show it just fine.

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler