Hi zyx, [re-sending this to the list, sent it as a direct reply by mistake]
You're right, sorry, I only checked my last round of changes with a production build by mistake. Here's a fixed (and properly tested) version. I couldn't spot the typo in that comment, but I rewrote it after a closer reading of section 7.5.5 anyway; it implies the ID can be indirect when a PDF is unencrypted. Thanks, On Sat, Oct 27, 2018 at 8:10 AM, zyx <z...@gmx.us> wrote: > On Thu, 2018-10-18 at 17:43 -0500, Clayton Wheeler wrote: > > Does this seem sensible? > > Hi, > the idea sounds good, but the patch doesn't compile. I guess you made > some changes just before creating the patch, but you didn't compile the > code, because otherwise you'd notice it. > > There's also a little typo in your comment, specifically here: > + // Per the PDF spec, section 7.5.5, the ID shall be an indirect > object. > > I'd also remove debugging stuff from the test code, those comments like > this one: > + //std::cerr << inBuf; > also because once someone enables it the code will not compile (again). > > Would you mind to correct the patch, please? > Thanks and bye, > zyx > > > > > _______________________________________________ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users > -- Clayton Wheeler cwhee...@genomenon.com
From 927c629c21e7eb63c8b15468c42263b8626b77de Mon Sep 17 00:00:00 2001 From: Clayton Wheeler <cwhee...@genomenon.com> Date: Thu, 11 Oct 2018 17:49:49 -0500 Subject: [PATCH] Handle trailer ID being an indirect object Some PDF writers (QuarkXPress and/or Quartz ca. Mac OS X 10.3.9, evidently) can write a file identifier in the trailer as an indirect object. This is implicitly allowed for unencrypted PDFs; this fix allows PoDoFo to parse such files. --- src/base/PdfWriter.cpp | 6 ++++ test/unit/ParserTest.cpp | 64 ++++++++++++++++++++++++++++++++++++++++ test/unit/ParserTest.h | 3 ++ 3 files changed, 73 insertions(+) diff --git a/src/base/PdfWriter.cpp b/src/base/PdfWriter.cpp index 237e0ff..a314c7a 100644 --- a/src/base/PdfWriter.cpp +++ b/src/base/PdfWriter.cpp @@ -686,6 +686,12 @@ void PdfWriter::CreateFileIdentifier( PdfString & identifier, const PdfObject* p if( pOriginalIdentifier && pTrailer->GetDictionary().HasKey( "ID" )) { const PdfObject* idObj = pTrailer->GetDictionary().GetKey("ID"); + // The PDF spec, section 7.5.5, implies that the ID may be + // indirect as long as the PDF is not encrypted. Handle that + // case. + if ( idObj->IsReference() ) { + idObj = m_vecObjects->GetObject( idObj->GetReference() ); + } TCIVariantList it = idObj->GetArray().begin(); if( it != idObj->GetArray().end() && diff --git a/test/unit/ParserTest.cpp b/test/unit/ParserTest.cpp index d0014cd..2c7936d 100644 --- a/test/unit/ParserTest.cpp +++ b/test/unit/ParserTest.cpp @@ -1981,6 +1981,70 @@ void ParserTest::testIsPdfFile() } } +void ParserTest::testRoundTripIndirectTrailerID() +{ + std::ostringstream oss; + oss << "%PDF-1.1\n"; + int nCurObj = 0; + int objPos[20]; + + // Pages + + int nPagesObj = nCurObj; + objPos[nCurObj] = oss.tellp(); + oss << nCurObj++ << " 0 obj\n"; + oss << "<</Type /Pages /Count 0 /Kids []>>\n"; + oss << "endobj"; + + // Root catalog + + int rootObj = nCurObj; + objPos[nCurObj] = oss.tellp(); + oss << nCurObj++ << " 0 obj\n"; + oss << "<</Type /Catalog /Pages " << nPagesObj << " 0 R>>\n"; + oss << "endobj\n"; + + // ID + int nIdObj = nCurObj; + objPos[nCurObj] = oss.tellp(); + oss << nCurObj++ << " 0 obj\n"; + oss << "[<F1E375363A6314E3766EDF396D614748> <F1E375363A6314E3766EDF396D614748>]\n"; + oss << "endobj\n"; + + int nXrefPos = oss.tellp(); + oss << "xref\n"; + oss << "0 " << nCurObj << "\n"; + char objRec[21]; + for ( int i = 0; i < nCurObj; i++ ) { + snprintf( objRec, 21, "%010d 00000 n \n", objPos[i] ); + oss << objRec; + } + oss << "trailer <<\n" + << " /Size " << nCurObj << "\n" + << " /Root " << rootObj << " 0 R\n" + << " /ID " << nIdObj << " 0 R\n" // indirect ID + << ">>\n" + << "startxref\n" + << nXrefPos << "\n" + << "%%EOF\n"; + + std::string sInBuf = oss.str(); + try { + PoDoFo::PdfMemDocument doc; + // load for update + doc.LoadFromBuffer( sInBuf.c_str(), sInBuf.size(), true ); + + PoDoFo::PdfRefCountedBuffer outBuf; + PoDoFo::PdfOutputDevice outDev( &outBuf ); + + doc.WriteUpdate( &outDev ); + // should not throw + CPPUNIT_ASSERT( true ); + } catch ( PoDoFo::PdfError& error ) { + CPPUNIT_FAIL( "Unexpected PdfError" ); + } +} + std::string ParserTest::generateXRefEntries( size_t count ) { std::string strXRefEntries; diff --git a/test/unit/ParserTest.h b/test/unit/ParserTest.h index b8f7ea9..cffcaaa 100644 --- a/test/unit/ParserTest.h +++ b/test/unit/ParserTest.h @@ -41,6 +41,7 @@ class ParserTest : public CppUnit::TestFixture CPPUNIT_TEST( testReadXRefStreamContents ); CPPUNIT_TEST( testReadObjects ); CPPUNIT_TEST( testIsPdfFile ); + CPPUNIT_TEST( testRoundTripIndirectTrailerID ); CPPUNIT_TEST_SUITE_END(); public: @@ -77,6 +78,8 @@ public: //void testReadNextTrailer(); //void testCheckEOFMarker(); + void testRoundTripIndirectTrailerID(); + private: std::string generateXRefEntries( size_t count ); bool canOutOfMemoryKillUnitTests(); -- 2.19.1
_______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users