Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Albert Astals Cid
El Dimarts, 8 de setembre de 2015, a les 15:33:30, Even Rouault va escriure:
> Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
> > On 08/09/15 21:06, Even Rouault wrote:
> > > Hi,
> > > 
> > > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
> > > to crash even if we call it with a low page number.
> > > 
> > > Even
> > >
> > >+  // to avoid too huge memory allocations layer and avoid crashes
> > >+  // This is the maximum number of indirect objects as per
> > >
> > > ISO-32000:2008 (Table C-1)
> > 
> > Table C-1 is a list of minimum limits for 32-bit readers.
> 
> Ah indeed. But they also state "Because Acrobat implementations are subject
> to these limits, applications producing PDF files are strongly advised to
> remain within them", so that might make sense to check that (even if
> Acrobat goes 64bit, which is perhaps the case, but anyway, does a 8 million
> page PDF make sense ?)
> 
> > >+  // We could probably decrease that number again. PDFium for
> > >example uses 1 Mi
> > >+  else if (numPages > 8 * 1024 * 1024) {
> > >+error(errSyntaxWarning, -1,
> > >+  "Page count ({0:d}) too big. Limiting number of
> > >
> > > reported pages to 8 Mi",
> > >
> > >+  numPages);
> > 
> > Instead of imposing an arbitrary limit we should just add a check for
> > gmallocn() returning NULL and print an error.
> 
> That would be another possibility. Just looked a bit more complicated to do
> it right and not leak memory for someone not familiar with the code base.
> > For broken PDFs that report an invalid size (see bug 85140) we could
> > check if the page count exceeds the number of objects in the XRef.
> 
> What would be the criterion to decide that a PDF is broken ? Or do you mean
> we should always check that the reported page count is no bigger than the
> number of objects in the XRef ? And in that case, should we limit the
> reported page count to the number of objects in the XRef, or just return 0
> with an error ?

I'd go for error.

Cheers,
  Albert

> > ___
> > poppler mailing list
> > poppler@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Even Rouault
Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
> On 08/09/15 21:06, Even Rouault wrote:
> > Hi,
> > 
> > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
> > to crash even if we call it with a low page number.
> > 
> > Even
> >
> >+  // to avoid too huge memory allocations layer and avoid crashes
> >+  // This is the maximum number of indirect objects as per
> >
> > ISO-32000:2008 (Table C-1)
> 
> Table C-1 is a list of minimum limits for 32-bit readers.

Ah indeed. But they also state "Because Acrobat implementations are subject to 
these limits, applications producing PDF files are strongly advised to remain 
within them", so that might make sense to check that (even if Acrobat goes 
64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
sense ?)

> 
> >+  // We could probably decrease that number again. PDFium for
> >example uses 1 Mi
> >+  else if (numPages > 8 * 1024 * 1024) {
> >+error(errSyntaxWarning, -1,
> >+  "Page count ({0:d}) too big. Limiting number of
> >
> > reported pages to 8 Mi",
> >
> >+  numPages);
> 
> Instead of imposing an arbitrary limit we should just add a check for
> gmallocn() returning NULL and print an error.

That would be another possibility. Just looked a bit more complicated to do it 
right and not leak memory for someone not familiar with the code base.

> 
> For broken PDFs that report an invalid size (see bug 85140) we could
> check if the page count exceeds the number of objects in the XRef.

What would be the criterion to decide that a PDF is broken ? Or do you mean we 
should always check that the reported page count is no bigger than the number 
of objects in the XRef ? And in that case, should we limit the reported page 
count to the number of objects in the XRef, or just return 0 with an error ?

> 
> ___
> poppler mailing list
> poppler@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Leonard Rosenthol
Acrobat/Reader have supported 64bit integers for close to a decade now, long 
before we actually delivered a 64bit app (which we just did on the Mac with 
Acrobat DC).

As for millions of pages in a PDF - I’ve never seen 8 million but I have seen a 
(real world!) 2 million pager.

Leonard




On 9/8/15, 9:33 AM, "poppler on behalf of Even Rouault" 
 
wrote:

>Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>> > Hi,
>> > 
>> > A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>> > to crash even if we call it with a low page number.
>> > 
>> > Even
>> >
>> >+  // to avoid too huge memory allocations layer and avoid crashes
>> >+  // This is the maximum number of indirect objects as per
>> >
>> > ISO-32000:2008 (Table C-1)
>> 
>> Table C-1 is a list of minimum limits for 32-bit readers.
>
>Ah indeed. But they also state "Because Acrobat implementations are subject to 
>these limits, applications producing PDF files are strongly advised to remain 
>within them", so that might make sense to check that (even if Acrobat goes 
>64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
>sense ?)
>
>> 
>> >+  // We could probably decrease that number again. PDFium for
>> >example uses 1 Mi
>> >+  else if (numPages > 8 * 1024 * 1024) {
>> >+error(errSyntaxWarning, -1,
>> >+  "Page count ({0:d}) too big. Limiting number of
>> >
>> > reported pages to 8 Mi",
>> >
>> >+  numPages);
>> 
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
>
>That would be another possibility. Just looked a bit more complicated to do it 
>right and not leak memory for someone not familiar with the code base.
>
>> 
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
>
>What would be the criterion to decide that a PDF is broken ? Or do you mean we 
>should always check that the reported page count is no bigger than the number 
>of objects in the XRef ? And in that case, should we limit the reported page 
>count to the number of objects in the XRef, or just return 0 with an error ?
>
>> 
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
>
>-- 
>Spatialys - Geospatial professional services
>http://www.spatialys.com
>___
>poppler mailing list
>poppler@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/poppler
___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Adrian Johnson
On 08/09/15 23:03, Even Rouault wrote:
> Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit :
>> On 08/09/15 21:06, Even Rouault wrote:
>>> Hi,
>>>
>>> A too huge number may cause the gmallocn() in Catalog::cachePageTree()
>>> to crash even if we call it with a low page number.
>>>
>>> Even
>>>
>>> +  // to avoid too huge memory allocations layer and avoid crashes
>>> +  // This is the maximum number of indirect objects as per
>>>
>>> ISO-32000:2008 (Table C-1)
>>
>> Table C-1 is a list of minimum limits for 32-bit readers.
> 
> Ah indeed. But they also state "Because Acrobat implementations are subject 
> to 
> these limits, applications producing PDF files are strongly advised to remain 
> within them", so that might make sense to check that (even if Acrobat goes 
> 64bit, which is perhaps the case, but anyway, does a 8 million page PDF make 
> sense ?)

A page count limit does not make sense. A limit that may be appropriate
for a 64-bit 16GB desktop would not be appropriate on a 32-bit embedded
system with limited memory. Better to just check for the out of memory
condition and report an error.

> 
>>
>>> +  // We could probably decrease that number again. PDFium for
>>> example uses 1 Mi
>>> +  else if (numPages > 8 * 1024 * 1024) {
>>> +error(errSyntaxWarning, -1,
>>> +  "Page count ({0:d}) too big. Limiting number of
>>>
>>> reported pages to 8 Mi",
>>>
>>> +  numPages);
>>
>> Instead of imposing an arbitrary limit we should just add a check for
>> gmallocn() returning NULL and print an error.
> 
> That would be another possibility. Just looked a bit more complicated to do 
> it 
> right and not leak memory for someone not familiar with the code base.
> 
>>
>> For broken PDFs that report an invalid size (see bug 85140) we could
>> check if the page count exceeds the number of objects in the XRef.
> 
> What would be the criterion to decide that a PDF is broken ? Or do you mean 
> we 
> should always check that the reported page count is no bigger than the number 
> of objects in the XRef ? And in that case, should we limit the reported page 
> count to the number of objects in the XRef, or just return 0 with an error ?

Since you did not provide a sample PDF that demonstrates the problem I
assumed that you have a broken PDF that claims to have a much higher
page count than the actual number of pages. If the PDF is not broken and
really does have more than 8 million pages it makes no sense to limit
the page count as this would prevent machines with sufficient memory
from being able to read the entire PDF.

> 
>>
>> ___
>> poppler mailing list
>> poppler@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


[poppler] 4 commits - poppler/GfxState.cc poppler/TextOutputDev.cc poppler/TextOutputDev.h qt4/src qt5/src

2015-09-08 Thread Albert Astals Cid
 poppler/GfxState.cc   |   14 
 poppler/TextOutputDev.cc  |4 
 poppler/TextOutputDev.h   |6 
 qt4/src/poppler-annotation.cc |4 
 qt4/src/poppler-document.cc   |3 
 qt4/src/poppler-fontinfo.cc   |3 
 qt4/src/poppler-form.cc   |3 
 qt4/src/poppler-optcontent.cc |2 
 qt4/src/poppler-page.cc   |1 
 qt5/src/poppler-annotation.cc |  612 +-
 qt5/src/poppler-document.cc   |3 
 qt5/src/poppler-fontinfo.cc   |3 
 qt5/src/poppler-form.cc   |3 
 qt5/src/poppler-optcontent.cc |2 
 qt5/src/poppler-page.cc   |1 
 qt5/src/poppler-private.cc|   18 -
 16 files changed, 348 insertions(+), 334 deletions(-)

New commits:
commit fc69a5e7dab48636946282e5d4b7be77e650023c
Author: Albert Astals Cid 
Date:   Tue Sep 8 17:48:49 2015 +0200

Qt5: Minor optimizations

Call reserve on containers
Use QStringLiteral/QLatin1String

diff --git a/qt5/src/poppler-annotation.cc b/qt5/src/poppler-annotation.cc
index 1de4903..5457ef5 100644
--- a/qt5/src/poppler-annotation.cc
+++ b/qt5/src/poppler-annotation.cc
@@ -1,5 +1,5 @@
 /* poppler-annotation.cc: qt interface to poppler
- * Copyright (C) 2006, 2009, 2012-2014 Albert Astals Cid 
+ * Copyright (C) 2006, 2009, 2012-2015 Albert Astals Cid 
  * Copyright (C) 2006, 2008, 2010 Pino Toscano 
  * Copyright (C) 2012, Guillermo A. Amaral B. 
  * Copyright (C) 2012-2014 Fabio D'Urso 
@@ -58,12 +58,12 @@ namespace Poppler {
 Annotation * AnnotationUtils::createAnnotation( const QDomElement & annElement 
)
 {
 // safety check on annotation element
-if ( !annElement.hasAttribute( "type" ) )
+if ( !annElement.hasAttribute( QStringLiteral("type") ) )
 return 0;
 
 // build annotation of given type
 Annotation * annotation = 0;
-int typeNumber = annElement.attribute( "type" ).toInt();
+int typeNumber = annElement.attribute( QStringLiteral("type") ).toInt();
 switch ( typeNumber )
 {
 case Annotation::AText:
@@ -97,7 +97,7 @@ void AnnotationUtils::storeAnnotation( const Annotation * 
ann, QDomElement & ann
 QDomDocument & document )
 {
 // save annotation's type as element's attribute
-annElement.setAttribute( "type", (uint)ann->subType() );
+annElement.setAttribute( QStringLiteral("type"), (uint)ann->subType() );
 
 // append all annotation data as children of this node
 ann->store( annElement, document );
@@ -1073,7 +1073,7 @@ Annotation::Annotation( AnnotationPrivate , const 
QDomNode  )
 Q_D( Annotation );
 
 // get the [base] element of the annotation node
-QDomElement e = AnnotationUtils::findChildElement( annNode, "base" );
+QDomElement e = AnnotationUtils::findChildElement( annNode, 
QStringLiteral("base") );
 if ( e.isNull() )
 return;
 
@@ -1081,24 +1081,24 @@ Annotation::Annotation( AnnotationPrivate , const 
QDomNode  )
 Popup w;
 
 // parse -contents- attributes
-if ( e.hasAttribute( "author" ) )
-setAuthor(e.attribute( "author" ));
-if ( e.hasAttribute( "contents" ) )
-setContents(e.attribute( "contents" ));
-if ( e.hasAttribute( "uniqueName" ) )
-setUniqueName(e.attribute( "uniqueName" ));
-if ( e.hasAttribute( "modifyDate" ) )
-setModificationDate(QDateTime::fromString( e.attribute( "modifyDate" ) 
));
-if ( e.hasAttribute( "creationDate" ) )
-setCreationDate(QDateTime::fromString( e.attribute( "creationDate" ) 
));
+if ( e.hasAttribute( QStringLiteral("author") ) )
+setAuthor(e.attribute( QStringLiteral("author") ));
+if ( e.hasAttribute( QStringLiteral("contents") ) )
+setContents(e.attribute( QStringLiteral("contents") ));
+if ( e.hasAttribute( QStringLiteral("uniqueName") ) )
+setUniqueName(e.attribute( QStringLiteral("uniqueName") ));
+if ( e.hasAttribute( QStringLiteral("modifyDate") ) )
+setModificationDate(QDateTime::fromString( e.attribute( 
QStringLiteral("modifyDate") ) ));
+if ( e.hasAttribute( QStringLiteral("creationDate") ) )
+setCreationDate(QDateTime::fromString( e.attribute( 
QStringLiteral("creationDate") ) ));
 
 // parse -other- attributes
-if ( e.hasAttribute( "flags" ) )
-setFlags(e.attribute( "flags" ).toInt());
-if ( e.hasAttribute( "color" ) )
-s.setColor(QColor( e.attribute( "color" ) ));
-if ( e.hasAttribute( "opacity" ) )
-s.setOpacity(e.attribute( "opacity" ).toDouble());
+if ( e.hasAttribute( QStringLiteral("flags") ) )
+setFlags(e.attribute( QStringLiteral("flags") ).toInt());
+if ( e.hasAttribute( QStringLiteral("color") ) )
+s.setColor(QColor( e.attribute( QStringLiteral("color") ) ));
+if ( e.hasAttribute( QStringLiteral("opacity") ) )
+s.setOpacity(e.attribute( QStringLiteral("opacity") ).toDouble());
 
 // 

Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count

2015-09-08 Thread Jason Crain

On 2015-09-08 07:43, Adrian Johnson wrote:

On 08/09/15 21:06, Even Rouault wrote:

Hi,

A too huge number may cause the gmallocn() in Catalog::cachePageTree()
to crash even if we call it with a low page number.

Even




+  // to avoid too huge memory allocations layer and avoid crashes
+  // This is the maximum number of indirect objects as per
ISO-32000:2008 (Table C-1)


Table C-1 is a list of minimum limits for 32-bit readers.



+  // We could probably decrease that number again. PDFium for
example uses 1 Mi
+  else if (numPages > 8 * 1024 * 1024) {
+error(errSyntaxWarning, -1,
+  "Page count ({0:d}) too big. Limiting number of
reported pages to 8 Mi",
+  numPages);


Instead of imposing an arbitrary limit we should just add a check for
gmallocn() returning NULL and print an error.

For broken PDFs that report an invalid size (see bug 85140) we could
check if the page count exceeds the number of objects in the XRef.


By the way, bug 91353 is also about an invalid (fuzzed) PDF claiming 
millions of pages.  I thought about maybe walking the page tree to count 
pages if the count seems unreasonable but it didn't seem worth the 
effort.  I like your idea better.

___
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler