Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-20 Thread Albert Astals Cid
El Dijous, 17 de setembre de 2015, a les 11:49:27, Jason Crain va escriure:
> On 2015-09-17 08:57, Leonard Rosenthol wrote:
> > While it is unclear in ISO 32000-1 whether such a PDF is invalid, we
> > made it clear in 32000-2 that you can only have one copy of each page
> > in the Pages tree.  So personally, I wouldn’t waste much time on this
> > particular file.
> > 
> > Leonard
> 
> OK, if it's not allowed by the spec, I have no real objection to the
> object count check.

Pushed.

Cheers,
  Albert

> 
> > On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"
> >  > 
> > ja...@aquaticape.us> wrote:
> >> On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
> >>> > > I don't know of a good way to validate the page count. Even
> >>> > > going through the page tree might be hard to do right without
> >>> > > leading to an infinite loop, in addition to being slow.
> >>> > 
> >>> > Catalog::cachePageTree goes over the tree, but i agree doing that
> >>> > to calculate the num of pages can be meh.
> >>> 
> >>> If the number of pages is huge, the PDF might be intentionally
> >>> corrupted to provoke a bug in a particular PDF viewer, and other
> >>> data structures could be subtly corrupted as well. Any scan would
> >>> have to proceed very cautiously.
> >>> 
> >>> If there is a minimum number of objects required for a page, and if
> >>> the total number of objects is easy to find, could poppler
> >>> immediately reject files with (total num objects) / (min objects per
> >>> page) < page count?
> >> 
> >> The document at
> >> https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
> >> I'm thinking of.  It has 5 objects and a single page that is listed in
> >> the /Kids array 10 times.  Duplicating the page just means adding it
> >> to the array again and incrementing /Count.  If we want this document
> >> to work then there's really no minimum number of objects required for
> >> a page.  Otherwise, each page would require at least a /Page object.
> >> 
> >> FWIW Adobe Reader shows an error on the document after the first
> >> duplicated page.  Other viewers show it just fine.
> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-17 Thread Leonard Rosenthol
While it is unclear in ISO 32000-1 whether such a PDF is invalid, we made it 
clear in 32000-2 that you can only have one copy of each page in the Pages 
tree.  So personally, I wouldn’t waste much time on this particular file.

Leonard



On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain" 
 wrote:

>On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
>> > > I don't know of a good way to validate the page count. Even
>> > > going through the page tree might be hard to do right without
>> > > leading to an infinite loop, in addition to being slow.
>> >
>> > Catalog::cachePageTree goes over the tree, but i agree doing that
>> > to calculate the num of pages can be meh.
>> 
>> If the number of pages is huge, the PDF might be intentionally
>> corrupted to provoke a bug in a particular PDF viewer, and other
>> data structures could be subtly corrupted as well. Any scan would
>> have to proceed very cautiously.
>> 
>> If there is a minimum number of objects required for a page, and if
>> the total number of objects is easy to find, could poppler
>> immediately reject files with (total num objects) / (min objects per
>> page) < page count?
>
>The document at
>https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
>I'm thinking of.  It has 5 objects and a single page that is listed in
>the /Kids array 10 times.  Duplicating the page just means adding it
>to the array again and incrementing /Count.  If we want this document
>to work then there's really no minimum number of objects required for
>a page.  Otherwise, each page would require at least a /Page object.
>
>FWIW Adobe Reader shows an error on the document after the first
>duplicated page.  Other viewers show it just fine.
>___
>poppler mailing list
>poppler@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-17 Thread Jason Crain

On 2015-09-17 08:57, Leonard Rosenthol wrote:

While it is unclear in ISO 32000-1 whether such a PDF is invalid, we
made it clear in 32000-2 that you can only have one copy of each page
in the Pages tree.  So personally, I wouldn’t waste much time on this
particular file.

Leonard


OK, if it's not allowed by the spec, I have no real objection to the 
object count check.



On 9/17/15, 1:04 AM, "poppler on behalf of Jason Crain"
 wrote:


On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:

> > I don't know of a good way to validate the page count. Even
> > going through the page tree might be hard to do right without
> > leading to an infinite loop, in addition to being slow.
>
> Catalog::cachePageTree goes over the tree, but i agree doing that
> to calculate the num of pages can be meh.

If the number of pages is huge, the PDF might be intentionally
corrupted to provoke a bug in a particular PDF viewer, and other
data structures could be subtly corrupted as well. Any scan would
have to proceed very cautiously.

If there is a minimum number of objects required for a page, and if
the total number of objects is easy to find, could poppler
immediately reject files with (total num objects) / (min objects per
page) < page count?


The document at
https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
I'm thinking of.  It has 5 objects and a single page that is listed in
the /Kids array 10 times.  Duplicating the page just means adding it
to the array again and incrementing /Count.  If we want this document
to work then there's really no minimum number of objects required for
a page.  Otherwise, each page would require at least a /Page object.

FWIW Adobe Reader shows an error on the document after the first
duplicated page.  Other viewers show it just fine.

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-16 Thread William Bader
> > I don't know of a good way to validate the page> > count. Even going 
> > through the page tree might be hard to do right without> > leading to an 
> > infinite loop, in addition to being slow.> > Catalog::cachePageTree goes 
> > over the tree, but i agree doing that to calculate the num of pages can be 
> > meh.

If the number of pages is huge, the PDF might be intentionally corrupted to 
provoke a bug in a particular PDF viewer, and other data structures could be 
subtly corrupted as well. Any scan would have to proceed very cautiously.

If there is a minimum number of objects required for a page, and if the total 
number of objects is easy to find, could poppler immediately reject files with 
(total num objects) / (min objects per page) < page count?

William

  ___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-16 Thread Jason Crain
On Wed, Sep 16, 2015 at 09:05:58PM -0400, William Bader wrote:
> > > I don't know of a good way to validate the page count. Even
> > > going through the page tree might be hard to do right without
> > > leading to an infinite loop, in addition to being slow.
> >
> > Catalog::cachePageTree goes over the tree, but i agree doing that
> > to calculate the num of pages can be meh.
> 
> If the number of pages is huge, the PDF might be intentionally
> corrupted to provoke a bug in a particular PDF viewer, and other
> data structures could be subtly corrupted as well. Any scan would
> have to proceed very cautiously.
> 
> If there is a minimum number of objects required for a page, and if
> the total number of objects is easy to find, could poppler
> immediately reject files with (total num objects) / (min objects per
> page) < page count?

The document at
https://drive.google.com/open?id=0ByTyiZeyQ4p9cTVBUllNRmI3bmM is what
I'm thinking of.  It has 5 objects and a single page that is listed in
the /Kids array 10 times.  Duplicating the page just means adding it
to the array again and incrementing /Count.  If we want this document
to work then there's really no minimum number of objects required for
a page.  Otherwise, each page would require at least a /Page object.

FWIW Adobe Reader shows an error on the document after the first
duplicated page.  Other viewers show it just fine.
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-11 Thread Even Rouault
Hi,

here's a V2 implementing suggested changes.

Best regards,

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
From 6e9a9a45550a320b97197f9c0142e8d9cd2b1f24 Mon Sep 17 00:00:00 2001
From: Even Rouault 
Date: Fri, 11 Sep 2015 13:56:05 +0200
Subject: [PATCH 2/2] Catalog::cachePageTree(): recover from out of memory condition

---
 poppler/Catalog.cc |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/poppler/Catalog.cc b/poppler/Catalog.cc
index 2426bfd..cc579fc 100644
--- a/poppler/Catalog.cc
+++ b/poppler/Catalog.cc
@@ -171,8 +171,8 @@ Catalog::~Catalog() {
   }
 }
 gfree(pages);
-gfree(pageRefs);
   }
+  gfree(pageRefs);
   names.free();
   dests.free();
   delete destNameTree;
@@ -305,8 +305,14 @@ GBool Catalog::cachePageTree(int page)
 }
 
 pagesSize = getNumPages();
-pages = (Page **)gmallocn(pagesSize, sizeof(Page *));
-pageRefs = (Ref *)gmallocn(pagesSize, sizeof(Ref));
+pages = (Page **)gmallocn_checkoverflow(pagesSize, sizeof(Page *));
+pageRefs = (Ref *)gmallocn_checkoverflow(pagesSize, sizeof(Ref));
+if (pages == NULL || pageRefs == NULL ) {
+  error(errSyntaxError, -1, "Cannot allocate page cache");
+  pagesDict->decRef();
+  pagesSize = 0;
+  return gFalse;
+}
 for (int i = 0; i < pagesSize; ++i) {
   pages[i] = NULL;
   pageRefs[i].num = -1;
-- 
1.7.0.4

From cc882151563b50acdf827401add81c27bb4fbd23 Mon Sep 17 00:00:00 2001
From: Even Rouault 
Date: Fri, 11 Sep 2015 13:30:32 +0200
Subject: [PATCH 1/2] Catalog::getNumPages(): validate page count

---
 poppler/Catalog.cc |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/poppler/Catalog.cc b/poppler/Catalog.cc
index 04caa1c..2426bfd 100644
--- a/poppler/Catalog.cc
+++ b/poppler/Catalog.cc
@@ -856,6 +856,18 @@ int Catalog::getNumPages()
   }
 } else {
   numPages = (int)obj.getNum();
+  if (numPages <= 0 ) {
+error(errSyntaxError, -1,
+  "Invalid page count {0:d}", numPages);
+numPages = 0;
+  }
+  else if (numPages > xref->getNumObjects()) {
+error(errSyntaxError, -1,
+  "Page count ({0:d}) larger than number of objects ({1:d})",
+  numPages, xref->getNumObjects());
+numPages = 0;
+  }
+
 }
 
 catDict.free();
-- 
1.7.0.4

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-11 Thread Albert Astals Cid
El Divendres, 11 de setembre de 2015, a les 13:59:51, Even Rouault va 
escriure:
> Hi,
> 
> here's a V2 implementing suggested changes.

Do you have files that show the need of these patches?

Cheers,
  Albert

> 
> Best regards,
> 
> Even

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-11 Thread Jason Crain

On 2015-09-11 06:59, Even Rouault wrote:

Hi,

here's a V2 implementing suggested changes.

Best regards,

Even


From 0001-Catalog-getNumPages-validate-page-count.patch:

@@ -856,6 +856,18 @@ int Catalog::getNumPages()
   }
 } else {
   numPages = (int)obj.getNum();
+  if (numPages <= 0 ) {
+error(errSyntaxError, -1,
+  "Invalid page count {0:d}", numPages);
+numPages = 0;
+  }
+  else if (numPages > xref->getNumObjects()) {
+error(errSyntaxError, -1,
+  "Page count ({0:d}) larger than number of objects 
({1:d})",

+  numPages, xref->getNumObjects());
+numPages = 0;
+  }
+
 }

 catDict.free();


I know I said I liked the idea of comparing the number of pages to the 
number
of objects, but on second thought, I think I could make a PDF that was 
just
many copies of the same page so it would be a completely valid PDF that 
has
more pages than objects.  I don't know of a good way to validate the 
page
count.  Even going through the page tree might be hard to do right 
without

leading to an infinite loop, in addition to being slow.
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-11 Thread Even Rouault
Le vendredi 11 septembre 2015 14:57:54, Albert Astals Cid a écrit :
> El Divendres, 11 de setembre de 2015, a les 13:59:51, Even Rouault va
> 
> escriure:
> > Hi,
> > 
> > here's a V2 implementing suggested changes.
> 
> Do you have files that show the need of these patches?

Sure. Completely made on purpose admitedly. Feel free to use/distribute it as 
needed.

My aim was to test GDAL robustness (with poppler backend for PDF driver) 
against hostile files.

> 
> Cheers,
>   Albert
> 
> > Best regards,
> > 
> > Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


one_billion_page.pdf
Description: Adobe PDF document
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Albert Astals Cid
El Dimarts, 8 de setembre de 2015, a les 15:33:30, Even Rouault va escriure:
> Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
> > On 08/09/15 21:06, Even Rouault wrote:
> > > Hi,
> > > 
> > > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
> > > to crash even if we call it with a low page number.
> > > 
> > > Even
> > >
> > >+  // to avoid too huge memory allocations layer and avoid crashes
> > >+  // This is the maximum number of indirect objects as per
> > >
> > > ISO-32000:2008 (Table C-1)
> > 
> > Table C-1 is a list of minimum limits for 32-bit readers.
> 
> Ah indeed. But they also state "Because Acrobat implementations are subject
> to these limits, applications producing PDF files are strongly advised to
> remain within them", so that might make sense to check that (even if
> Acrobat goes 64bit, which is perhaps the case, but anyway, does a 8 million
> page PDF make sense ?)
> 
> > >+  // We could probably decrease that number again. PDFium for
> > >example uses 1 Mi
> > >+  else if (numPages > 8 * 1024 * 1024) {
> > >+error(errSyntaxWarning, -1,
> > >+  "Page count ({0:d}) too big. Limiting number of
> > >
> > > reported pages to 8 Mi",
> > >
> > >+  numPages);
> > 
> > Instead of imposing an arbitrary limit we should just add a check for
> > gmallocn() returning NULL and print an error.
> 
> That would be another possibility. Just looked a bit more complicated to do
> it right and not leak memory for someone not familiar with the code base.
> > For broken PDFs that report an invalid size (see bug 85140) we could
> > check if the page count exceeds the number of objects in the XRef.
> 
> What would be the criterion to decide that a PDF is broken ? Or do you mean
> we should always check that the reported page count is no bigger than the
> number of objects in the XRef ? And in that case, should we limit the
> reported page count to the number of objects in the XRef, or just return 0
> with an error ?

I'd go for error.

Cheers,
  Albert

> > ___
> > poppler mailing list
> > poppler@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Even Rouault
Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
> On 08/09/15 21:06, Even Rouault wrote:
> > Hi,
> > 
> > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
> > to crash even if we call it with a low page number.
> > 
> > Even
> >
> >+  // to avoid too huge memory allocations layer and avoid crashes
> >+  // This is the maximum number of indirect objects as per
> >
> > ISO-32000:2008 (Table C-1)
> 
> Table C-1 is a list of minimum limits for 32-bit readers.

Ah indeed. But they also state "Because Acrobat implementations are subject to 
these limits, applications producing PDF files are strongly advised to remain 
within them", so that might make sense to check that (even if Acrobat goes 
64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
sense ?)

> 
> >+  // We could probably decrease that number again. PDFium for
> >example uses 1 Mi
> >+  else if (numPages > 8 * 1024 * 1024) {
> >+error(errSyntaxWarning, -1,
> >+  "Page count ({0:d}) too big. Limiting number of
> >
> > reported pages to 8 Mi",
> >
> >+  numPages);
> 
> Instead of imposing an arbitrary limit we should just add a check for
> gmallocn() returning NULL and print an error.

That would be another possibility. Just looked a bit more complicated to do it 
right and not leak memory for someone not familiar with the code base.

> 
> For broken PDFs that report an invalid size (see bug 85140) we could
> check if the page count exceeds the number of objects in the XRef.

What would be the criterion to decide that a PDF is broken ? Or do you mean we 
should always check that the reported page count is no bigger than the number 
of objects in the XRef ? And in that case, should we limit the reported page 
count to the number of objects in the XRef, or just return 0 with an error ?

> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Leonard Rosenthol
Acrobat/Reader have supported 64bit integers for close to a decade now, long 
before we actually delivered a 64bit app (which we just did on the Mac with 
Acrobat DC).

As for millions of pages in a PDF - I’ve never seen 8 million but I have seen a 
(real world!) 2 million pager.

Leonard




On 9/8/15, 9:33 AM, "poppler on behalf of Even Rouault" 
 
wrote:

>Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>> > Hi,
>> > 
>> > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>> > to crash even if we call it with a low page number.
>> > 
>> > Even
>> >
>> >+  // to avoid too huge memory allocations layer and avoid crashes
>> >+  // This is the maximum number of indirect objects as per
>> >
>> > ISO-32000:2008 (Table C-1)
>> 
>> Table C-1 is a list of minimum limits for 32-bit readers.
>
>Ah indeed. But they also state "Because Acrobat implementations are subject to 
>these limits, applications producing PDF files are strongly advised to remain 
>within them", so that might make sense to check that (even if Acrobat goes 
>64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
>sense ?)
>
>> 
>> >+  // We could probably decrease that number again. PDFium for
>> >example uses 1 Mi
>> >+  else if (numPages > 8 * 1024 * 1024) {
>> >+error(errSyntaxWarning, -1,
>> >+  "Page count ({0:d}) too big. Limiting number of
>> >
>> > reported pages to 8 Mi",
>> >
>> >+  numPages);
>> 
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
>
>That would be another possibility. Just looked a bit more complicated to do it 
>right and not leak memory for someone not familiar with the code base.
>
>> 
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
>
>What would be the criterion to decide that a PDF is broken ? Or do you mean we 
>should always check that the reported page count is no bigger than the number 
>of objects in the XRef ? And in that case, should we limit the reported page 
>count to the number of objects in the XRef, or just return 0 with an error ?
>
>> 
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>-- 
>Spatialys - Geospatial professional services
>http://www.spatialys.com
>___
>poppler mailing list
>poppler@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Adrian Johnson
On 08/09/15 23:03, Even Rouault wrote:
> Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>>> Hi,
>>>
>>> A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>>> to crash even if we call it with a low page number.
>>>
>>> Even
>>>
>>> +  // to avoid too huge memory allocations layer and avoid crashes
>>> +  // This is the maximum number of indirect objects as per
>>>
>>> ISO-32000:2008 (Table C-1)
>>
>> Table C-1 is a list of minimum limits for 32-bit readers.
> 
> Ah indeed. But they also state "Because Acrobat implementations are subject 
> to 
> these limits, applications producing PDF files are strongly advised to remain 
> within them", so that might make sense to check that (even if Acrobat goes 
> 64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
> sense ?)

A page count limit does not make sense. A limit that may be appropriate
for a 64-bit 16GB desktop would not be appropriate on a 32-bit embedded
system with limited memory. Better to just check for the out of memory
condition and report an error.

> 
>>
>>> +  // We could probably decrease that number again. PDFium for
>>> example uses 1 Mi
>>> +  else if (numPages > 8 * 1024 * 1024) {
>>> +error(errSyntaxWarning, -1,
>>> +  "Page count ({0:d}) too big. Limiting number of
>>>
>>> reported pages to 8 Mi",
>>>
>>> +  numPages);
>>
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
> 
> That would be another possibility. Just looked a bit more complicated to do 
> it 
> right and not leak memory for someone not familiar with the code base.
> 
>>
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
> 
> What would be the criterion to decide that a PDF is broken ? Or do you mean 
> we 
> should always check that the reported page count is no bigger than the number 
> of objects in the XRef ? And in that case, should we limit the reported page 
> count to the number of objects in the XRef, or just return 0 with an error ?

Since you did not provide a sample PDF that demonstrates the problem I
assumed that you have a broken PDF that claims to have a much higher
page count than the actual number of pages. If the PDF is not broken and
really does have more than 8 million pages it makes no sense to limit
the page count as this would prevent machines with sufficient memory
from being able to read the entire PDF.

> 
>>
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Jason Crain

On 2015-09-08 07:43, Adrian Johnson wrote:

On 08/09/15 21:06, Even Rouault wrote:

Hi,

A too huge number may cause the gmallocn() in Catalog::cachePageTree()
to crash even if we call it with a low page number.

Even




+  // to avoid too huge memory allocations layer and avoid crashes
+  // This is the maximum number of indirect objects as per
ISO-32000:2008 (Table C-1)


Table C-1 is a list of minimum limits for 32-bit readers.



+  // We could probably decrease that number again. PDFium for
example uses 1 Mi
+  else if (numPages > 8 * 1024 * 1024) {
+error(errSyntaxWarning, -1,
+  "Page count ({0:d}) too big. Limiting number of
reported pages to 8 Mi",
+  numPages);


Instead of imposing an arbitrary limit we should just add a check for
gmallocn() returning NULL and print an error.

For broken PDFs that report an invalid size (see bug 85140) we could
check if the page count exceeds the number of objects in the XRef.


By the way, bug 91353 is also about an invalid (fuzzed) PDF claiming 
millions of pages.  I thought about maybe walking the page tree to count 
pages if the count seems unreasonable but it didn't seem worth the 
effort.  I like your idea better.

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler