Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
El Dimarts, 8 de setembre de 2015, a les 15:33:30, Even Rouault va escriure: > Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit : > > On 08/09/15 21:06, Even Rouault wrote: > > > Hi, > > > > > > A too huge number may cause the gmallocn() in Catalog::cachePageTree() > > > to crash even if we call it with a low page number. > > > > > > Even > > > > > >+ // to avoid too huge memory allocations layer and avoid crashes > > >+ // This is the maximum number of indirect objects as per > > > > > > ISO-32000:2008 (Table C-1) > > > > Table C-1 is a list of minimum limits for 32-bit readers. > > Ah indeed. But they also state "Because Acrobat implementations are subject > to these limits, applications producing PDF files are strongly advised to > remain within them", so that might make sense to check that (even if > Acrobat goes 64bit, which is perhaps the case, but anyway, does a 8 million > page PDF make sense ?) > > > >+ // We could probably decrease that number again. PDFium for > > >example uses 1 Mi > > >+ else if (numPages > 8 * 1024 * 1024) { > > >+error(errSyntaxWarning, -1, > > >+ "Page count ({0:d}) too big. Limiting number of > > > > > > reported pages to 8 Mi", > > > > > >+ numPages); > > > > Instead of imposing an arbitrary limit we should just add a check for > > gmallocn() returning NULL and print an error. > > That would be another possibility. Just looked a bit more complicated to do > it right and not leak memory for someone not familiar with the code base. > > For broken PDFs that report an invalid size (see bug 85140) we could > > check if the page count exceeds the number of objects in the XRef. > > What would be the criterion to decide that a PDF is broken ? Or do you mean > we should always check that the reported page count is no bigger than the > number of objects in the XRef ? And in that case, should we limit the > reported page count to the number of objects in the XRef, or just return 0 > with an error ? I'd go for error. Cheers, Albert > > ___ > > poppler mailing list > > poppler@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/poppler ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit : > On 08/09/15 21:06, Even Rouault wrote: > > Hi, > > > > A too huge number may cause the gmallocn() in Catalog::cachePageTree() > > to crash even if we call it with a low page number. > > > > Even > > > >+ // to avoid too huge memory allocations layer and avoid crashes > >+ // This is the maximum number of indirect objects as per > > > > ISO-32000:2008 (Table C-1) > > Table C-1 is a list of minimum limits for 32-bit readers. Ah indeed. But they also state "Because Acrobat implementations are subject to these limits, applications producing PDF files are strongly advised to remain within them", so that might make sense to check that (even if Acrobat goes 64bit, which is perhaps the case, but anyway, does a 8 million page PDF make sense ?) > > >+ // We could probably decrease that number again. PDFium for > >example uses 1 Mi > >+ else if (numPages > 8 * 1024 * 1024) { > >+error(errSyntaxWarning, -1, > >+ "Page count ({0:d}) too big. Limiting number of > > > > reported pages to 8 Mi", > > > >+ numPages); > > Instead of imposing an arbitrary limit we should just add a check for > gmallocn() returning NULL and print an error. That would be another possibility. Just looked a bit more complicated to do it right and not leak memory for someone not familiar with the code base. > > For broken PDFs that report an invalid size (see bug 85140) we could > check if the page count exceeds the number of objects in the XRef. What would be the criterion to decide that a PDF is broken ? Or do you mean we should always check that the reported page count is no bigger than the number of objects in the XRef ? And in that case, should we limit the reported page count to the number of objects in the XRef, or just return 0 with an error ? > > ___ > poppler mailing list > poppler@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/poppler -- Spatialys - Geospatial professional services http://www.spatialys.com ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
Acrobat/Reader have supported 64bit integers for close to a decade now, long before we actually delivered a 64bit app (which we just did on the Mac with Acrobat DC). As for millions of pages in a PDF - I’ve never seen 8 million but I have seen a (real world!) 2 million pager. Leonard On 9/8/15, 9:33 AM, "poppler on behalf of Even Rouault"wrote: >Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit : >> On 08/09/15 21:06, Even Rouault wrote: >> > Hi, >> > >> > A too huge number may cause the gmallocn() in Catalog::cachePageTree() >> > to crash even if we call it with a low page number. >> > >> > Even >> > >> >+ // to avoid too huge memory allocations layer and avoid crashes >> >+ // This is the maximum number of indirect objects as per >> > >> > ISO-32000:2008 (Table C-1) >> >> Table C-1 is a list of minimum limits for 32-bit readers. > >Ah indeed. But they also state "Because Acrobat implementations are subject to >these limits, applications producing PDF files are strongly advised to remain >within them", so that might make sense to check that (even if Acrobat goes >64bit, which is perhaps the case, but anyway, does a 8 million page PDF make >sense ?) > >> >> >+ // We could probably decrease that number again. PDFium for >> >example uses 1 Mi >> >+ else if (numPages > 8 * 1024 * 1024) { >> >+error(errSyntaxWarning, -1, >> >+ "Page count ({0:d}) too big. Limiting number of >> > >> > reported pages to 8 Mi", >> > >> >+ numPages); >> >> Instead of imposing an arbitrary limit we should just add a check for >> gmallocn() returning NULL and print an error. > >That would be another possibility. Just looked a bit more complicated to do it >right and not leak memory for someone not familiar with the code base. > >> >> For broken PDFs that report an invalid size (see bug 85140) we could >> check if the page count exceeds the number of objects in the XRef. > >What would be the criterion to decide that a PDF is broken ? Or do you mean we >should always check that the reported page count is no bigger than the number >of objects in the XRef ? And in that case, should we limit the reported page >count to the number of objects in the XRef, or just return 0 with an error ? > >> >> ___ >> poppler mailing list >> poppler@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/poppler > >-- >Spatialys - Geospatial professional services >http://www.spatialys.com >___ >poppler mailing list >poppler@lists.freedesktop.org >http://lists.freedesktop.org/mailman/listinfo/poppler ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
On 08/09/15 23:03, Even Rouault wrote: > Le mardi 08 septembre 2015 14:43:07, Adrian Johnson a écrit : >> On 08/09/15 21:06, Even Rouault wrote: >>> Hi, >>> >>> A too huge number may cause the gmallocn() in Catalog::cachePageTree() >>> to crash even if we call it with a low page number. >>> >>> Even >>> >>> + // to avoid too huge memory allocations layer and avoid crashes >>> + // This is the maximum number of indirect objects as per >>> >>> ISO-32000:2008 (Table C-1) >> >> Table C-1 is a list of minimum limits for 32-bit readers. > > Ah indeed. But they also state "Because Acrobat implementations are subject > to > these limits, applications producing PDF files are strongly advised to remain > within them", so that might make sense to check that (even if Acrobat goes > 64bit, which is perhaps the case, but anyway, does a 8 million page PDF make > sense ?) A page count limit does not make sense. A limit that may be appropriate for a 64-bit 16GB desktop would not be appropriate on a 32-bit embedded system with limited memory. Better to just check for the out of memory condition and report an error. > >> >>> + // We could probably decrease that number again. PDFium for >>> example uses 1 Mi >>> + else if (numPages > 8 * 1024 * 1024) { >>> +error(errSyntaxWarning, -1, >>> + "Page count ({0:d}) too big. Limiting number of >>> >>> reported pages to 8 Mi", >>> >>> + numPages); >> >> Instead of imposing an arbitrary limit we should just add a check for >> gmallocn() returning NULL and print an error. > > That would be another possibility. Just looked a bit more complicated to do > it > right and not leak memory for someone not familiar with the code base. > >> >> For broken PDFs that report an invalid size (see bug 85140) we could >> check if the page count exceeds the number of objects in the XRef. > > What would be the criterion to decide that a PDF is broken ? Or do you mean > we > should always check that the reported page count is no bigger than the number > of objects in the XRef ? And in that case, should we limit the reported page > count to the number of objects in the XRef, or just return 0 with an error ? Since you did not provide a sample PDF that demonstrates the problem I assumed that you have a broken PDF that claims to have a much higher page count than the actual number of pages. If the PDF is not broken and really does have more than 8 million pages it makes no sense to limit the page count as this would prevent machines with sufficient memory from being able to read the entire PDF. > >> >> ___ >> poppler mailing list >> poppler@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/poppler > ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler
[poppler] 4 commits - poppler/GfxState.cc poppler/TextOutputDev.cc poppler/TextOutputDev.h qt4/src qt5/src
poppler/GfxState.cc | 14 poppler/TextOutputDev.cc |4 poppler/TextOutputDev.h |6 qt4/src/poppler-annotation.cc |4 qt4/src/poppler-document.cc |3 qt4/src/poppler-fontinfo.cc |3 qt4/src/poppler-form.cc |3 qt4/src/poppler-optcontent.cc |2 qt4/src/poppler-page.cc |1 qt5/src/poppler-annotation.cc | 612 +- qt5/src/poppler-document.cc |3 qt5/src/poppler-fontinfo.cc |3 qt5/src/poppler-form.cc |3 qt5/src/poppler-optcontent.cc |2 qt5/src/poppler-page.cc |1 qt5/src/poppler-private.cc| 18 - 16 files changed, 348 insertions(+), 334 deletions(-) New commits: commit fc69a5e7dab48636946282e5d4b7be77e650023c Author: Albert Astals CidDate: Tue Sep 8 17:48:49 2015 +0200 Qt5: Minor optimizations Call reserve on containers Use QStringLiteral/QLatin1String diff --git a/qt5/src/poppler-annotation.cc b/qt5/src/poppler-annotation.cc index 1de4903..5457ef5 100644 --- a/qt5/src/poppler-annotation.cc +++ b/qt5/src/poppler-annotation.cc @@ -1,5 +1,5 @@ /* poppler-annotation.cc: qt interface to poppler - * Copyright (C) 2006, 2009, 2012-2014 Albert Astals Cid + * Copyright (C) 2006, 2009, 2012-2015 Albert Astals Cid * Copyright (C) 2006, 2008, 2010 Pino Toscano * Copyright (C) 2012, Guillermo A. Amaral B. * Copyright (C) 2012-2014 Fabio D'Urso @@ -58,12 +58,12 @@ namespace Poppler { Annotation * AnnotationUtils::createAnnotation( const QDomElement & annElement ) { // safety check on annotation element -if ( !annElement.hasAttribute( "type" ) ) +if ( !annElement.hasAttribute( QStringLiteral("type") ) ) return 0; // build annotation of given type Annotation * annotation = 0; -int typeNumber = annElement.attribute( "type" ).toInt(); +int typeNumber = annElement.attribute( QStringLiteral("type") ).toInt(); switch ( typeNumber ) { case Annotation::AText: @@ -97,7 +97,7 @@ void AnnotationUtils::storeAnnotation( const Annotation * ann, QDomElement & ann QDomDocument & document ) { // save annotation's type as element's attribute -annElement.setAttribute( "type", (uint)ann->subType() ); +annElement.setAttribute( QStringLiteral("type"), (uint)ann->subType() ); // append all annotation data as children of this node ann->store( annElement, document ); @@ -1073,7 +1073,7 @@ Annotation::Annotation( AnnotationPrivate , const QDomNode ) Q_D( Annotation ); // get the [base] element of the annotation node -QDomElement e = AnnotationUtils::findChildElement( annNode, "base" ); +QDomElement e = AnnotationUtils::findChildElement( annNode, QStringLiteral("base") ); if ( e.isNull() ) return; @@ -1081,24 +1081,24 @@ Annotation::Annotation( AnnotationPrivate , const QDomNode ) Popup w; // parse -contents- attributes -if ( e.hasAttribute( "author" ) ) -setAuthor(e.attribute( "author" )); -if ( e.hasAttribute( "contents" ) ) -setContents(e.attribute( "contents" )); -if ( e.hasAttribute( "uniqueName" ) ) -setUniqueName(e.attribute( "uniqueName" )); -if ( e.hasAttribute( "modifyDate" ) ) -setModificationDate(QDateTime::fromString( e.attribute( "modifyDate" ) )); -if ( e.hasAttribute( "creationDate" ) ) -setCreationDate(QDateTime::fromString( e.attribute( "creationDate" ) )); +if ( e.hasAttribute( QStringLiteral("author") ) ) +setAuthor(e.attribute( QStringLiteral("author") )); +if ( e.hasAttribute( QStringLiteral("contents") ) ) +setContents(e.attribute( QStringLiteral("contents") )); +if ( e.hasAttribute( QStringLiteral("uniqueName") ) ) +setUniqueName(e.attribute( QStringLiteral("uniqueName") )); +if ( e.hasAttribute( QStringLiteral("modifyDate") ) ) +setModificationDate(QDateTime::fromString( e.attribute( QStringLiteral("modifyDate") ) )); +if ( e.hasAttribute( QStringLiteral("creationDate") ) ) +setCreationDate(QDateTime::fromString( e.attribute( QStringLiteral("creationDate") ) )); // parse -other- attributes -if ( e.hasAttribute( "flags" ) ) -setFlags(e.attribute( "flags" ).toInt()); -if ( e.hasAttribute( "color" ) ) -s.setColor(QColor( e.attribute( "color" ) )); -if ( e.hasAttribute( "opacity" ) ) -s.setOpacity(e.attribute( "opacity" ).toDouble()); +if ( e.hasAttribute( QStringLiteral("flags") ) ) +setFlags(e.attribute( QStringLiteral("flags") ).toInt()); +if ( e.hasAttribute( QStringLiteral("color") ) ) +s.setColor(QColor( e.attribute( QStringLiteral("color") ) )); +if ( e.hasAttribute( QStringLiteral("opacity") ) ) +s.setOpacity(e.attribute( QStringLiteral("opacity") ).toDouble()); //
Re: [poppler] [PATCH] Catalog::getNumPages(): validate page count
On 2015-09-08 07:43, Adrian Johnson wrote: On 08/09/15 21:06, Even Rouault wrote: Hi, A too huge number may cause the gmallocn() in Catalog::cachePageTree() to crash even if we call it with a low page number. Even + // to avoid too huge memory allocations layer and avoid crashes + // This is the maximum number of indirect objects as per ISO-32000:2008 (Table C-1) Table C-1 is a list of minimum limits for 32-bit readers. + // We could probably decrease that number again. PDFium for example uses 1 Mi + else if (numPages > 8 * 1024 * 1024) { +error(errSyntaxWarning, -1, + "Page count ({0:d}) too big. Limiting number of reported pages to 8 Mi", + numPages); Instead of imposing an arbitrary limit we should just add a check for gmallocn() returning NULL and print an error. For broken PDFs that report an invalid size (see bug 85140) we could check if the page count exceeds the number of objects in the XRef. By the way, bug 91353 is also about an invalid (fuzzed) PDF claiming millions of pages. I thought about maybe walking the page tree to count pages if the count seems unreasonable but it didn't seem worth the effort. I like your idea better. ___ poppler mailing list poppler@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/poppler