Re: [Podofo-users] AdjustByteRange throws useful information away
Greetings, What do you think about the last statement from the management? Here it is: https://onedrive.live.com/download?cid=59AA1C67E7B1D0CD&resid=59AA1C67E7B1D0CD%21108&authkey=AGW7cRT1fFgw8_U File password: E4345 Hi all, Please find enclosed a patch against SVN rev 2033 which fixes a bug with PdfDocument::FillXObjectFromPage(). For the case, that pContents->IsArray(), it is important to join the streams from the array with a delimiter. This has been added with the patch. Without the patch, the last operator of the first stream (e.g. "Q") is incorrectly "glued" together with the operand of the following stream (e.g. "1.000") creating a syntax error (e.g. "Q1.000"). Best regards, Amin ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Patch for PdfDocument::FillXObjectFromPage()
Hi all, Please find enclosed a patch against SVN rev 2033 which fixes a bug with PdfDocument::FillXObjectFromPage(). For the case, that pContents->IsArray(), it is important to join the streams from the array with a delimiter. This has been added with the patch. Without the patch, the last operator of the first stream (e.g. "Q") is incorrectly "glued" together with the operand of the following stream (e.g. "1.000") creating a syntax error (e.g. "Q1.000"). Best regards, Amin patch_FillXObjectFromPage.diff Description: Binary data signature.asc Description: Message signed with OpenPGP ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()
Anyone? This patch includes important fixes, e.g. for CVE-2017-8054 (not really fixed upto r1937!). Greetings, Amin > Am 28.08.2018 um 08:55 schrieb A. Massad : > > Signierter PGP-Teil > Hi all, > > Please find enclosed a patch against SVN rev 1937 which fixes three important > issues with PdfPagesTree::GetPageNode(). > > To demonstrate the issues the unit test PagesTreeTest has been extended by > three new tests which all fail for r1937 and are fixed by this patch. > > The patch includes: > 1) A real fix of CVE-2017-8054 (not really fixed upto r1937!) for handling of > cyclic trees, see testCyclicTree() > 2) A fix for handling of subtrees with „/Kids []“ and „/Count 0“ which is > completely valid according to the PDF spec, see testEmptyKidsTree() > 3) A changed behavior for trees with nested kids array which are not valid > according to the PDF spec and now yield an NULL ptr, see testNestedArrayTree() > > Please note that this patch superseeds my former patch named > „patch_getpagenode_cyclic_trees.diff“ against r1935, which only covered issue > 1. > > I am looking forward to your feedback! > > Best regards, > Amin > > > > >> Am 22.08.2018 um 16:32 schrieb a.mas...@gmx.de: >> >> Hello again, >> >> Haven’t received any feedback on this issue, yet. So, I started to „dive" >> into the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned >> that for the sake of correctness and security this function needs a rewrite >> especially with the removal of GetPageNodeFromArray(). >> >> Please find enclosed a small patch against SVN rev 1935 for another problem >> of GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 >> which may cause infinite recursion on cyclic trees. For clearity, I have >> also extended the unit test. >> >> I am looking forward to your feedback! >> >> Best regards, >> Amin >> >> >> >>> Am 20.08.2018 um 16:29 schrieb A. Massad : >>> >>> Hi Everyone, >>> >>> There is a problem with PdfPagesTree::GetPageNode() which yields NULL for >>> valid PDFs. >>> >>> E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF: >>> https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE >>> >>> This PDF is an example for a strange but valid page tree containing >>> "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“. >>> According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this >>> tree should be handled: >>> [...] >>> Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() >>> shows that there is considerable code duplication and a lot of special >>> cases, even for malformed PDFs. In fact, I would like to propose the >>> complete removal of GetPageNodeFromArray() because it’s not needed, the >>> condition for calling it is currently wrong and not easy to correct, and it >>> introduces unclean code. There is another call to GetPageNodeFromArray() >>> which also is unsure about its results and tries at least to correct this >>> by checking the result for NULL. >>> >>> Rather the full tree traversal in GetPageNode() would be sufficient and >>> correct for all cases. This end clearly needs further inspection of a >>> PoDoFo expert. >>> >>> Best regards, >>> Amin > > > signature.asc Description: Message signed with OpenPGP -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()
Hi all, Please find enclosed a patch against SVN rev 1937 which fixes three important issues with PdfPagesTree::GetPageNode(). To demonstrate the issues the unit test PagesTreeTest has been extended by three new tests which all fail for r1937 and are fixed by this patch. The patch includes: 1) A real fix of CVE-2017-8054 (not really fixed upto r1937!) for handling of cyclic trees, see testCyclicTree() 2) A fix for handling of subtrees with „/Kids []“ and „/Count 0“ which is completely valid according to the PDF spec, see testEmptyKidsTree() 3) A changed behavior for trees with nested kids array which are not valid according to the PDF spec and now yield an NULL ptr, see testNestedArrayTree() Please note that this patch superseeds my former patch named „patch_getpagenode_cyclic_trees.diff“ against r1935, which only covered issue 1. I am looking forward to your feedback! Best regards, Amin patch_getpagenode_rev1937.diff Description: Binary data > Am 22.08.2018 um 16:32 schrieb a.mas...@gmx.de: > > Hello again, > > Haven’t received any feedback on this issue, yet. So, I started to „dive" > into the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned > that for the sake of correctness and security this function needs a rewrite > especially with the removal of GetPageNodeFromArray(). > > Please find enclosed a small patch against SVN rev 1935 for another problem > of GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 which > may cause infinite recursion on cyclic trees. For clearity, I have also > extended the unit test. > > I am looking forward to your feedback! > > Best regards, > Amin > > > >> Am 20.08.2018 um 16:29 schrieb A. Massad : >> >> Hi Everyone, >> >> There is a problem with PdfPagesTree::GetPageNode() which yields NULL for >> valid PDFs. >> >> E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF: >> https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE >> >> This PDF is an example for a strange but valid page tree containing >> "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“. >> According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this >> tree should be handled: >> [...] >> Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() >> shows that there is considerable code duplication and a lot of special >> cases, even for malformed PDFs. In fact, I would like to propose the >> complete removal of GetPageNodeFromArray() because it’s not needed, the >> condition for calling it is currently wrong and not easy to correct, and it >> introduces unclean code. There is another call to GetPageNodeFromArray() >> which also is unsure about its results and tries at least to correct this by >> checking the result for NULL. >> >> Rather the full tree traversal in GetPageNode() would be sufficient and >> correct for all cases. This end clearly needs further inspection of a PoDoFo >> expert. >> >> Best regards, >> Amin signature.asc Description: Message signed with OpenPGP -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] PdfPagesTree::GetPageNode() - Why does it handle nested arrays?
Hi, The current implementation of PdfPagesTree::GetPageNode() has a questionable branch for nested kids arrays: // We have to traverse the tree while( it != rKidsArray.end() ) { if( (*it).IsArray() ) { // Fixes PDFs broken by having trees with arrays nested once ... } Does anyone know what the relevance of nested kids arrays is??? Where do such broken PDFs occur and why should they be handled by PoDoFo? This is not in accordance with the PDF spec. And I have not found a single PDF tool (including Adobe products) which handles such broken PDFs, yet. I think this case is meant to handle PDFs containing /Pages nodes of this form: 3 0 obj<> endobj However, if there is no really good reason for it, this branch should be completely removed from GetPageNode() to open the way for further improvements of the current code. Best regards, Amin signature.asc Description: Message signed with OpenPGP -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()
Hello again, Haven’t received any feedback on this issue, yet. So, I started to „dive" into the code of PdfPagesTree::GetPageNode(). Now, I am even more concerned that for the sake of correctness and security this function needs a rewrite especially with the removal of GetPageNodeFromArray(). Please find enclosed a small patch against SVN rev 1935 for another problem of GetPageNode(): It fixes a DoS vulnerability similar to CVE-2017-8054 which may cause infinite recursion on cyclic trees. For clearity, I have also extended the unit test. I am looking forward to your feedback! Best regards, Amin patch_getpagenode_cyclic_trees.diff Description: Binary data > Am 20.08.2018 um 16:29 schrieb A. Massad : > > Hi Everyone, > > There is a problem with PdfPagesTree::GetPageNode() which yields NULL for > valid PDFs. > > E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF: > https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE > > This PDF is an example for a strange but valid page tree containing > "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“. > According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this > tree should be handled: > >> Conforming products shall be prepared to handle any form of tree structure >> built of such nodes. > > In fact, Adobe products have no problems with the PDF and Preflight checks > show no problem either. However, PoDoFo cannot handle this tree: > >> 372 0 obj >> << >> /Type /Pages >> /Count 3 >> /Kids [ 373 0 R 374 0 R 375 0 R ] >>>> >> endobj >> 373 0 obj >> << >> /Type /Pages >> /Count 3 >> /Kids [ 380 0 R 1 0 R 6 0 R ] >> /Parent 372 0 R >>>> >> endobj >> 374 0 obj >> << >> /Type /Pages >> /Count 0 >> /Kids [ ] >> /Parent 372 0 R >>>> >> endobj >> 375 0 obj >> << >> /Type /Pages >> /Count 0 >> /Kids [ ] >> /Parent 372 0 R >>>> >> endobj >> ... >> 379 0 obj >> << >> /Type /Catalog >> /Lang (de) >> /MarkInfo << >> /Marked true >>>> >> /Metadata 21 0 R >> /OpenAction [ 380 0 R /XYZ null null null ] >> /OutputIntents [ 376 0 R ] >> /Pages 372 0 R >> /StructTreeRoot 39 0 R >>>> >> endobj > > The problem stems from this part of GetPageNode() where it calls > GetPageNodeFromArray(): > >> if( numDirectKids == numKids && static_cast(nPageNum) < >> numDirectKids ) >>{ >>// This node has only page nodes as kids, >>// so we can access the array directly >>rLstParents.push_back( pParent ); >>return GetPageNodeFromArray( nPageNum, rKidsArray, rLstParents ); >>} > > The condition of the if-statement is true for this tree. However, > GetPageNodeFromArray() cannot handle the tree layout in rKidsArray correctly. > > Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() > shows that there is considerable code duplication and a lot of special cases, > even for malformed PDFs. In fact, I would like to propose the complete > removal of GetPageNodeFromArray() because it’s not needed, the condition for > calling it is currently wrong and not easy to correct, and it introduces > unclean code. There is another call to GetPageNodeFromArray() which also is > unsure about its results and tries at least to correct this by checking the > result for NULL. > > Rather the full tree traversal in GetPageNode() would be sufficient and > correct for all cases. This end clearly needs further inspection of a PoDoFo > expert. > > Best regards, > Amin > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Bug in PdfPagesTree::GetPageNode() / PdfPagesTree::GetPageNodeFromArray()
Hi Everyone, There is a problem with PdfPagesTree::GetPageNode() which yields NULL for valid PDFs. E.g. GetPageNode() for nPageNum=1 fails for this 3 page PDF: https://eur-lex.europa.eu/legal-content/DE/TXT/PDF/?uri=CELEX:52018XC0810(05)&from=DE This PDF is an example for a strange but valid page tree containing "/Pages“-Nodes with "/Count 0“ and „/Kids [ ]“. According to the PDF Spec "Section 7.7.3 Page Tree / 7.7.3.1 General" this tree should be handled: > Conforming products shall be prepared to handle any form of tree structure > built of such nodes. In fact, Adobe products have no problems with the PDF and Preflight checks show no problem either. However, PoDoFo cannot handle this tree: > 372 0 obj > << > /Type /Pages > /Count 3 > /Kids [ 373 0 R 374 0 R 375 0 R ] > >> > endobj > 373 0 obj > << > /Type /Pages > /Count 3 > /Kids [ 380 0 R 1 0 R 6 0 R ] > /Parent 372 0 R > >> > endobj > 374 0 obj > << > /Type /Pages > /Count 0 > /Kids [ ] > /Parent 372 0 R > >> > endobj > 375 0 obj > << > /Type /Pages > /Count 0 > /Kids [ ] > /Parent 372 0 R > >> > endobj > ... > 379 0 obj > << > /Type /Catalog > /Lang (de) > /MarkInfo << > /Marked true > >> > /Metadata 21 0 R > /OpenAction [ 380 0 R /XYZ null null null ] > /OutputIntents [ 376 0 R ] > /Pages 372 0 R > /StructTreeRoot 39 0 R > >> > endobj The problem stems from this part of GetPageNode() where it calls GetPageNodeFromArray(): > if( numDirectKids == numKids && static_cast(nPageNum) < > numDirectKids ) > { > // This node has only page nodes as kids, > // so we can access the array directly > rLstParents.push_back( pParent ); > return GetPageNodeFromArray( nPageNum, rKidsArray, rLstParents ); > } The condition of the if-statement is true for this tree. However, GetPageNodeFromArray() cannot handle the tree layout in rKidsArray correctly. Closer inspection of the code in GetPageNode() and GetPageNodeFromArray() shows that there is considerable code duplication and a lot of special cases, even for malformed PDFs. In fact, I would like to propose the complete removal of GetPageNodeFromArray() because it’s not needed, the condition for calling it is currently wrong and not easy to correct, and it introduces unclean code. There is another call to GetPageNodeFromArray() which also is unsure about its results and tries at least to correct this by checking the result for NULL. Rather the full tree traversal in GetPageNode() would be sufficient and correct for all cases. This end clearly needs further inspection of a PoDoFo expert. Best regards, Amin -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] PATCH for PdfXRefStreamParserObject.cpp (Repost)
Hello PoDoFo developers, I think that I’ve fixed a bug in PdfXRefStreamParserObject which occurred if the first array entry in the „W“-Array has a zero value: Current SVN version r1665 treated this case as „type 0“ (free object), however according to the PDF-Spec this case should have a default value „type 1“. Please see the spec "PDF 32000-1:2008“, Section "7.5.8.2 Cross-Reference Stream Dictionary“, "Table 17 – Additional entries specific to a cross-reference stream dictionary“. The description to the key „W“ states: > A value of zero for an element in the W array indicates that the > corresponding field shall not be present in the stream, and the default value > shall be used, if there is one. If the first element is zero, the type field > shall not be present, and shall default to type 1. > I have managed to fix this issue with the following patch to PdfXRefStreamParserObject.cpp, the if-statement added just before the switch-statement fixed this bug: > Index: podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp > === > --- podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp (revision 7630) > +++ podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp (working copy) > @@ -228,6 +228,7 @@ > > //printf("OBJ=%i nData = [ %i %i %i ]\n", nObjNo, > static_cast(nData[0]), static_cast(nData[1]), > static_cast(nData[2]) ); > (*m_pOffsets)[nObjNo].bParsed = true; > +if (lW[0]==0) nData[0]=1; // If the first element is zero, the type > field shall not be present, and shall default to type 1. > switch( nData[0] ) // nData[0] contains the type information of this > entry > { > case 0: Without this patch, I could not create a PdfMemDocument for a certain PDF sample file which contains an XRef with a W-key of the form [0 3 0]. The symptom looked like this > PoDoFo encounter an error. Error: 15 ePdfError_NoObject > Error Description: A object was expected but not found. > Callstack: > #0 Error Source: > /Users/amin/podofo/podofo-svn/podofo-src/src/doc/PdfMemDocument.cpp:182 > Information: Catalog object not found! The error occurred in the function PdfMemDocument::InitFromParser( PdfParser* pParser ) at the call PdfObject* pCatalog = pTrailer->GetIndirectKey( "Root" ); which yielded the error as the indirect object in „Root“ could not be dereferenced (because this->GetObjects().GetSize() yielded zero caused by all PdfObjects being parsed of type „free“). After the mentioned patch the problematic PDF file could be parsed without problems... Could you please check this patch and add it to the SVN version? Thank you very much! Best regards, Amin -- New Year. New Location. New Benefits. New Data Center in Ashburn, VA. GigeNET is offering a free month of service with a new server in Ashburn. Choose from 2 high performing configs, both with 100TB of bandwidth. Higher redundancy.Lower latency.Increased capacity.Completely compliant. http://p.sf.net/sfu/gigenet___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] PATCH for PdfXRefStreamParserObject.cpp
Hello PoDoFo developers, I think that I’ve fixed a bug in PdfXRefStreamParserObject which occurred if the first array entry in the „W“-Array has a zero value: Current SVN version r1665 treated this case as „type 0“ (free object), however according to the PDF-Spec this case should have a default value „type 1“. Please see the spec "PDF 32000-1:2008“, Section "7.5.8.2 Cross-Reference Stream Dictionary“, "Table 17 – Additional entries specific to a cross-reference stream dictionary“. The description to the key „W“ states: > A value of zero for an element in the W array indicates that the > corresponding field shall not be present in the stream, and the default value > shall be used, if there is one. If the first element is zero, the type field > shall not be present, and shall default to type 1. > I have managed to fix this issue with the following patch to PdfXRefStreamParserObject.cpp, the if-statement added just before the switch-statement fixed this bug: > Index: podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp > === > --- podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp (revision 7630) > +++ podofo-src-r1665/src/base/PdfXRefStreamParserObject.cpp (working copy) > @@ -228,6 +228,7 @@ > > //printf("OBJ=%i nData = [ %i %i %i ]\n", nObjNo, > static_cast(nData[0]), static_cast(nData[1]), > static_cast(nData[2]) ); > (*m_pOffsets)[nObjNo].bParsed = true; > +if (lW[0]==0) nData[0]=1; // If the first element is zero, the type > field shall not be present, and shall default to type 1. > switch( nData[0] ) // nData[0] contains the type information of this > entry > { > case 0: Without this patch, I could not create a PdfMemDocument for a certain PDF sample file which contains an XRef with a W-key of the form [0 3 0]. The symptom looked like this > PoDoFo encounter an error. Error: 15 ePdfError_NoObject > Error Description: A object was expected but not found. > Callstack: > #0 Error Source: > /Users/amin/csci/svn-src/extern/podofo/podofo-svn/podofo-src/src/doc/PdfMemDocument.cpp:182 > Information: Catalog object not found! The error occurred in the function PdfMemDocument::InitFromParser( PdfParser* pParser ) at the call PdfObject* pCatalog = pTrailer->GetIndirectKey( "Root" ); which yielded the error as the indirect object in „Root“ could not be dereferenced (because this->GetObjects().GetSize() yielded zero caused by all PdfObjects being parsed of type „free“). After the mentioned patch the problematic PDF file could be parsed without problems... Could you please check this patch and add it to the SVN version? Thank you very much! Best regards, Amin signature.asc Description: Message signed with OpenPGP using GPGMail -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Suggested patch for PdfParser.cpp and Mingw-64
Hello, The current PoDoFo svn-version 1558 does not work under Mingw-64: PdfParser.cpp uses "%I" (upper-case "i", do not misread as "L") in the scanf format string which seems to be undefined with GCC and completely breaks parsing of PDF files. For example podofouncompress does not work anymore and yields an exception "ePdfError_NoObject". The following patch seems to fix everything by using the default scanf format string in the #else branch of the #ifdef: > svn diff > Index: PdfParser.cpp > === > --- PdfParser.cpp (revision 1558) > +++ PdfParser.cpp (working copy) > @@ -757,7 +757,7 @@ > if( !m_offsets[objID].bParsed ) > { > m_offsets[objID].bParsed = true; > -#ifdef _WIN64 > +#if defined(_WIN64) && defined(_MSC_VER) > sscanf( m_buffer.GetBuffer(), "%10I64d %5ld %c%c%c", > &(m_offsets[objID].lOffset), > &(m_offsets[objID].lGeneration), > &(m_offsets[objID].cUsed), &empty1, &empty2 ); Best regards, Amin smime.p7s Description: S/MIME cryptographic signature -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Bugfix for PdfObjectStreamParserObject::ReadObjectsFromStream()
Dear PoDoFo-Team, I have found and (hopefully) fixed a bug in PdfObjectStreamParserObject::ReadObjectsFromStream() which yielded a parse error for some PDFs with ObjectStreams. Could you please check the proposed patch and add it to the recent SVN version of PoDoFo? Thanks a lot! Here comes the patch against SVN rev 1507: > Index: podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp > === > --- podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp (revision 5315) > +++ podofo-src-r1507/src/base/PdfObjectStreamParserObject.cpp (working copy) > @@ -105,6 +105,7 @@ > } > > // move back to the position inside of the table of contents > +device.Device()->Clear(); > device.Device()->Seek( pos ); > > ++i; The background of this patch is as follows. SYMPTOMS: Got a PDF created by Adobe Illustrator 10 which yielded a parse error in PoDoFo, e.g. calling podofopdfinfo yielded the following output: > ./build/bin/podofopdfinfo sample.pdf > Error: An error 5 ocurred during uncompressing the pdf file. > > > PoDoFo encounter an error. Error: 5 ePdfError_UnexpectedEOF > Error Description: End of file was reached unxexpectedly. > Callstack: > #0 Error Source: podofo-svn/podofo-src/src/base/PdfParser.cpp:213 > Information: Unable to load objects from file. > #1 Error Source: podofo-svn/podofo-src/src/base/PdfTokenizer.cpp:340 > Information: Expected number CAUSE: The PDF contains an ObjectStream which ends where the last three objects are of type "number". The *END* of the object stream looks like this: 18272 19199 11818 There is a problem in the function PdfObjectStreamParserObject::ReadObjectsFromStream() because it uses *TWO* tokenizers (line 75 and line 89) on *ONE* InputDevice "device". Now, the call of variantTokenizer.GetNextVariant() in line 90 for the third last number, i.e. "18272" in the example above, trigger EOF on the device because GetNextToken() always reads three tokens. After that we want to "move back to the position inside of the table of contents" (line 107f) by calling device.Device()->Seek( pos ). This does not work anymore because EOF was already reached and reading the next object number lObj in line 81 from the TOC of the object stream fails. Hence, we get an error before we cab reach the second last number "19199" of the object stream. > 72 void PdfObjectStreamParserObject::ReadObjectsFromStream( char* pBuffer, > pdf_long lBufferLen, long long lNum, long long lFirst, ObjectIdList const & > list) > 73 { > 74 PdfRefCountedInputDevice device( pBuffer, lBufferLen ); > 75 PdfTokenizer tokenizer( device, m_buffer ); > 76 PdfVariant var; > 77 int i = 0; > 78 > 79 while( static_cast(i) < lNum ) > 80 { > 81 const long long lObj = tokenizer.GetNextNumber(); > 82 const long long lOff = tokenizer.GetNextNumber(); > 83 const std::streamoff pos = device.Device()->Tell(); > 84 > 85 // move to the position of the object in the stream > 86 device.Device()->Seek( static_cast(lFirst + > lOff) ); > 87 > 88 // use a second tokenizer here so that anything that > gets dequeued isn't left in the tokenizer that reads the offsets and lengths > 89 PdfTokenizer variantTokenizer( device, m_buffer ); > 90 variantTokenizer.GetNextVariant( var, m_pEncrypt ); > 91 bool should_read = std::find(list.begin(), > list.end(), lObj) != list.end(); > 92 #if defined(PODOFO_VERBOSE_DEBUG) > 93 std::cerr << "ReadObjectsFromStream STREAM=" << > m_pParser->Reference().ToString() << > 94 ", OBJ=" << lObj << > 95 ", " << (should_read ? "read" : "skipped") << > std::endl; > 96 #endif > 97 if (should_read) > 98 { > 99 if(m_vecObjects->GetObject(PdfReference( > static_cast(lObj), 0LL ))) > 100 { > 101 PdfError::LogMessage( eLogSeverity_Warning, "Object: > %li 0 R will be deleted and loaded again.\n", lObj ); > 102 delete m_vecObjects->RemoveObject(PdfReference( > static_cast(lObj), 0LL ),false); > 103 } > 104 m_vecObjects->insert_sorted( new PdfObject( PdfReference( > static_cast(lObj), 0LL ), var ) ); > 105 } > 106 > 107 // move back to the position inside of the table of contents > 108 device.Device()->Clear(); // ** NEWLY ADDED ** > 109 device.Device()->Seek( pos ); > 110 > 111 ++i; > 112 } > 113 } SOLUTION: To clear EOF, I have add line 108 to so
[Podofo-users] How to make annotations editable in AcroRead?
Hello, The function PdfPage::CreateAnnotation() allows to create PDF annotations. They are displayed in AcroRead and Acrobat (and Mac OS X preview.app). However, you need Acrobat to edit annotations and change their state (e.g. to "accepted"). If you open the same PDF in AcroRead, all annotations are not editable and have a small lock icon in the annotation list. It is possible to "unlock" the annotations in Acrobat with the menu item "Comments > Enable for Commenting and Analysis in Adobe Reader". Is it possible to accomplish the same unlocking of annotations for AcroRead by means of PoDoFo or any other command line tool? Seems to be a proprietary feature based on some signatures added by Acrobat to the dictonaries of /AcroFrom + /Perms + /UR3... Thank you for your help! Best regards, Amin -- Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-dev2dev ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Improvements of PdfContentsTokenizer::ReadInlineImgData()
Hi, I have encountered two problems in PdfContentsTokenizer::ReadInlineImgData(): 1) Parsing expects a whitespace *before* the EI operator (end of image data) whereas it should expect a whitespace *after* the EI. 2) Buffer for image data has a fixed size of 4096 bytes. The patch (against svn rev. 1298) included in this E-Mail provides a solution for both issues. Some further details: To 1) Unfortunately, the PDF spec does not clearly define how the EI operator should be detected in the data following the ID operator. The size of the data is not specified, and there seems to be no "escaping" mechanisms if the sequence EI should occur in the image data. However, there is an "heuristic" approach by other PDF parsers which expect a whitespace *after* the EI operator. See, here for such a discussion: http://www.planetpdf.com/forumarchive/134376.asp > Topic: Re: parsing inline images (Via Email) > Conf: (P-PDF) Developers, Msg: 134376 > From: LeonardR > Date: 6/13/2005 10:58 PM > > At 06:38 PM 6/13/2005, p-pdf-developers Listmanager wrote: > >The image data contains "EI " where the > >white space is a space (0x20). > > The actual image data, or the encoded version of the data? Are > you decoding and then looking or grabbing the inline image data till you > find the "EI" and then decoding? > > > >our parser detects either a space or cr lf. > > I've looked at the sources to a few content stream parsers (my > own, Xpdf, Multivalent, etc.) and they all also support "EI" followed by at > least one whitespace character (specifically space, CR or LF). > Prior to the patch, PoDoFo expects to find a whitespace *before* the EI operator and fails to detect the end of image data for some PDFs created by a common PDF workflow software. To 2) The PDF spec states that inlined images *should* not be larger than 4K. However, it does not forbid images to be larger. Again, some common PDF outputs contained inlined images larger than 4K. In that case, PoDoFo should not fail but rather resize the buffer. Hopefully, this patch will be helpful for other users, too. Many thanks to all developers for this great project! Best regards, Amin > Index: podofo-src-r1298/src/PdfContentsTokenizer.cpp > === > --- podofo-src-r1298/src/PdfContentsTokenizer.cpp (revision 1298) > +++ podofo-src-r1298/src/PdfContentsTokenizer.cpp (working copy) > @@ -202,40 +202,43 @@ > PODOFO_RAISE_ERROR( ePdfError_InvalidHandle ); > } > > -// cosume the only whitespace between ID and data > +// consume the only whitespace between ID and data > c = m_device.Device()->Look(); > if( PdfTokenizer::IsWhitespace( c ) ) > { > c = m_device.Device()->GetChar(); > } > > -while( (c = m_device.Device()->Look()) != EOF > - && counter < static_cast(m_buffer.GetSize()) ) > -{ > -if (PdfTokenizer::IsWhitespace(c)) > -{ > -// test if end-of-image-data is reached (hit EI keyword) > -c = m_device.Device()->GetChar(); // skip the white space > -char e = m_device.Device()->GetChar(); > -char i = m_device.Device()->GetChar(); > -m_device.Device()->Seek(-2, std::ios::cur); > -if (e == 'E' && i == 'I') > -{ > -m_buffer.GetBuffer()[counter] = '\0'; > -rVariant = PdfData(m_buffer.GetBuffer(), > static_cast(counter)); > -reType = ePdfContentsType_ImageData; > -m_readingInlineImgData = false; > -return true; > -} > -m_buffer.GetBuffer()[counter] = c; > -++counter; > -} > -else > -{ > -c = m_device.Device()->GetChar(); > -m_buffer.GetBuffer()[counter] = c; > -++counter; > -} > +while((c = m_device.Device()->Look()) != EOF) { > + c = m_device.Device()->GetChar(); > + if (c=='E' && m_device.Device()->Look()=='I') { > + char i = m_device.Device()->GetChar(); > + char w = m_device.Device()->Look(); > +if (w==EOF || PdfTokenizer::IsWhitespace(w)) { > + // EI is followed by whitespace => stop > + m_device.Device()->Seek(-2, std::ios::cur); // put back "EI" > + m_buffer.GetBuffer()[counter] = '\0'; > + rVariant = PdfData(m_buffer.GetBuffer(), > static_cast(counter)); > + reType = ePdfContentsType_ImageData; > + m_readingInlineImgData = false; > + return true; > + } > + else { > + // no whitespace after EI => do not stop > + m_device.Device()->Seek(-1, std::ios::cur); // put back "I" > + m_buffer.GetBuffer()[counter] = c; > + ++counter; > + } > + } > + else { > + m_buffer.GetBuffer()[counter] = c; > + ++counter; > + } > + > + if (counter == static_cast(m_buffer.GetSize())) { > + // image
[Podofo-users] Compiling PoDoFo without libpng
Hello, Is there a flag to compile PoDoFo (svn-version) without libpng support? The library exists on my build system, i.e. it is found by cmake, but I don't want libpng to be used by libpodofo. I have found a work-around by commenting out the line ### FIND_PACKAGE(PNG) in CMakeLists.txt. However, maybe someone knows a better solution without editing the source. Thank you for your help! Best regards, Amin -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] PdfFilterFactory::CreateFilterList() should allow references as FILTER name
Dominik Seichter wrote on 05/12/2010 at 19:10: > Hi, > > Where did you find the information in the PDF reference that references are > allowed in the filters array? > From my understanding of Table 3.4 in the PDFReference 1.7 is states clearly > that the /Filter key is either of type name or array. This part of the spec is quite "tricky". Indeed, table 34 states that the type should be name. However, an indirect object (i.e. reference) is allowed to replace *any* object - in that case it has to *point* to an object which is a name. I refer to section 7.3.10 of the spec: > 7.3.10Indirect Objects > Any object in a PDF file may be labelled as an indirect object. This gives > the object a unique object identifier by which other objects can refer to it > (for example, as an element of an array or as the value of a dictionary > entry). This statement even refers to an example which matches the replacement of an array element by a reference. > I added a patch to SVN. Could you please check if this fixes your problem? > GREAT! It fixes the problem. Thank you for your excellent work!!! Greetings, Amin-- ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] PdfFilterFactory::CreateFilterList() should allow references as FILTER name
Hi, Found a problem with podofo (svn r1231 and older versions) which I cannot fix by myself: It's located in PdfFilter.cpp (line 366) where it is required that the array elements of /Filter have to be names: > if (! (*it).IsName() ) > { >PODOFO_RAISE_ERROR_INFO( ePdfError_InvalidDataType, "Filter array > contained unexpected non-name type" ); >... However, according to the PDF spec it is not forbidden to have references in the filter array. For example, the following PDF code with the reference "487 0 R" is valid by the spec (and by Acrobat preflight for syntax errors): > 485 0 obj > << > /Range [0 1 0 1 0 1 0 1] > /Filter [487 0 R] > /Domain [0 1 0 1 0 1 0 1] > /FunctionType 4 > /Length 28 > >> > stream > x<9c>«V0PP0U0T(ÊÏÉ!<8a>]<90>_<80><82>k^A > endstream > endobj > > 487 0 obj > /FlateDecode > endobj Still, this PDF code is rejected by podofo! To fix it, the file PdfFilter.cpp:366 has to be extended by some kind of dereferencing in case of (*it).IsReference(). This is usually done by code fragments like this: > const PdfObject* pElem = *it; > while ( pElem && pElem->IsReference() ) > { >pElem = pParent->GetObject( pElem->GetReference() ); > } However, how do get the parent object pParent into the PdfFilter class. It is required to apply GetObject() to it, isn't it? Could you please help me, to fix this? Thank you, Amin -- ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Printing PdfStrings with escape sequences
Hi Dominik, your latest changes in the SVN (rev 1173) introduced a serious bug into the PoDoFo-Code. Your access to m_escMap with m_escMap[static_cast(*pBuf)] is wrong because this cast may yield negative indices (for chars > 0x7f). In that case, cEsc becomes randomly true and yields to unexpected escape sequences, such as "\1". Could you please check my suggested diffs (against rev 1174) below and commit these changes to the SVN - if they are correct. Thank you! Amin PS: In my opinion, the additional sizeof(char) should be added to the malloc() because it's used in the memset()-call, too. PPS: Please note that a cast unsigned(*pBuf) does not correct the problem and yields very large numbers for "negative" chars. Use "unsigned(*pBuf)&0xff" to prevent this. Index: src/PdfString.cpp === --- src/PdfString.cpp (revision 1174) +++ src/PdfString.cpp (working copy) @@ -48,7 +48,7 @@ static const char* genEscMap() { const long lAllocLen = 256; -char* map = static_cast(malloc(lAllocLen)); +char* map = static_cast(sizeof(char)*malloc(lAllocLen)); memset( map, 0, sizeof(char) * lAllocLen ); map['\n'] = 'n'; // Line feed (LF) @@ -376,7 +376,7 @@ while( lLen-- ) { -const char & cEsc = m_escMap[static_cast(*pBuf)]; +const char & cEsc = m_escMap[static_cast(*pBuf)]; if( cEsc != 0 ) { pDevice->Write( "\\", 1 ); Index: src/PdfTokenizer.cpp === --- src/PdfTokenizer.cpp(revision 1174) +++ src/PdfTokenizer.cpp(working copy) @@ -53,7 +53,7 @@ { inti; const long lAllocLen = 256; -char* map = static_cast(malloc(lAllocLen)); +char* map = static_cast(sizeof(char)*malloc(lAllocLen)); memset( map, 0, sizeof(char) * lAllocLen ); for (i = 0; i < PoDoFo::s_nNumDelimiters; ++i) map[static_cast(PoDoFo::s_cDelimiters[i])] = 1; @@ -67,7 +67,7 @@ { int i; const long lAllocLen = 256; -char* map = static_cast(malloc(lAllocLen)); +char* map = static_cast(sizeof(char)*malloc(lAllocLen)); memset( map, 0, sizeof(char) * lAllocLen ); for (i = 0; i < PoDoFo::s_nNumWhiteSpaces; ++i) map[static_cast(PoDoFo::s_cWhiteSpaces[i])] = 1; @@ -78,7 +78,7 @@ const char* genEscMap() { const long lAllocLen = 256; -char* map = static_cast(malloc(lAllocLen)); +char* map = static_cast(sizeof(char)*malloc(lAllocLen)); memset( map, 0, sizeof(char) * lAllocLen ); map['n'] = '\n'; // Line feed (LF) @@ -646,7 +646,7 @@ else { // Handle plain escape sequences -const char & code = m_escMap[m_device.Device()->GetChar()]; + const char & code = m_escMap[(unsigned char)(m_device.Device()->GetChar())]; if( code ) m_vecBuffer.push_back( code ); Am 14.12.2009 um 15:55 schrieb Dominik Seichter: > Thanks for the clarifications. I commited a fix to SVN, could you please > check > that the behaviour is now correct for you? > > Best regards, > Dom > > Am Samstag 12 Dezember 2009 schrieb A. Massad: >> Am 29.11.2009 um 19:21 schrieb Dominik Seichter: >>> Hi, >>> >>> I do not see how this is a problem. It is true that PoDoFo writes >>> (Hello\nWorld) as (Hello >>> World) into the PDF. But the PDF is read as sequence of bytes and the >>> byte for the linebreak is still there. If I understand the PDF reference >>> correctly, the behaviour of PoDoFo is correct. Escaping is optional and >>> not required. >>> >>> Please correct me if I am wrong here! >> >> Sorry for the late reply, it took me some time to investigate this issue: I >> came to the conclusion that your statement does not agree with the PDF >> spec. The following is a quotation from section "7.4.3.2 Literal Strings": >> >> "An end-of-line marker appearing within a literal string without a >> preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah), >> irrespective of whether the end-of-line marker was a CARRIAGE RETURN >> (0Dh), a LINE FEED (0Ah), or both." >> >> That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they >> loose the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now, >> if you read in such a PDF with the Adobe tools, they treat this >> end-of-line marker as 0Ah. This is exactly the behaviour I have observed. >> >> I am pretty sure that the output of PoDoFo is
Re: [Podofo-users] Printing PdfStrings with escape sequences
Am 14.12.2009 um 15:55 schrieb Dominik Seichter: > Thanks for the clarifications. I commited a fix to SVN, could you please > check > that the behaviour is now correct for you? Dominik, thank you for your fix! I have just tried rev. 1173 and it works fine for my examples :-) Now, "\n", "\r" and all similar escape sequences are preserved by PdfString. However, I am not quite sure what your code does with octal codes "\ddd": For example, \015 which is equal to \r (Carriage Return 0Dh). This should be preserved as well - otherwise it would yield the same problem that it will be interpreted as 0Ah if escaping is expanded to a single byte. Greetings, Amin -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Printing PdfStrings with escape sequences
Am 29.11.2009 um 19:21 schrieb Dominik Seichter: > Hi, > > I do not see how this is a problem. It is true that PoDoFo writes > (Hello\nWorld) as (Hello > World) into the PDF. But the PDF is read as sequence of bytes and the byte > for > the linebreak is still there. If I understand the PDF reference correctly, > the > behaviour of PoDoFo is correct. Escaping is optional and not required. > > Please correct me if I am wrong here! Sorry for the late reply, it took me some time to investigate this issue: I came to the conclusion that your statement does not agree with the PDF spec. The following is a quotation from section "7.4.3.2 Literal Strings": "An end-of-line marker appearing within a literal string without a preceding REVERSE SOLIDUS shall be treated as a byte value of (0Ah), irrespective of whether the end-of-line marker was a CARRIAGE RETURN (0Dh), a LINE FEED (0Ah), or both." That means: If PoDoFo expands \n to a single code 0Ah and \r to 0Dh, they loose the "REVERSE SOLIDUS" ("\") and become an end-of-line marker. Now, if you read in such a PDF with the Adobe tools, they treat this end-of-line marker as 0Ah. This is exactly the behaviour I have observed. I am pretty sure that the output of PoDoFo is wrong: Due to the expansion of the escape sequences of \r and \n, the hex codes 0Ah and 0Dh become indistinguishable for PDF readers. This might be OK if they just represent end-of-lines. However, due to Character Encodings with "Differences"-Mappings, the hex codes 0Dh and 0Ah might be mapped to different printable characters. In that case, the PoDoFo yields to serious errors! Best regards, Amin > Am Montag 23 November 2009 schrieb A. Massad: >> Hello, >> >> Maybe this is a bug in PoDoFo - or just wrong usage of the library >> functions: >> >> Reading/parsing PDF-Files which contain strings with escape sequences, e.g. >> (\r) or (\b), causes problem when writing these strings: the functions >> PdfVariant::Write() and PdfString::Write() yield a strange output - that >> is: (\r) >> becomes >> ( >> ) >> and >> (\b) >> becomes >> ) >> respectively. >> >> That means, the escape sequences are resolved in the output to a CR or >> BACKSPACE instead of maintaining the escaping with the Backslash "\". >> >> Due to this behavior, the written PDFs are corrupt (esp. due to malformed >> syntax produced by the \b). >> >> Is there a special function or flag to find a work-around for this >> behavior? I could not fix the problem with PoDoFo functions but rather had >> use PdfVariant::ToString() and rewrite the std::string manually to hex >> codes... >> >> Thank you for your help! >> >> Greetings, >> Amin >> >> PS: The strange encodings with low code numbers occur in a PDF where a Font >> Encoding remaps all present characters by a "Differences"-Mapping to codes >> starting at 1 - i.e. the non-printable chars will be mapped to printable >> characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character Encoding). -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Printing PdfStrings with escape sequences
Hello, Maybe this is a bug in PoDoFo - or just wrong usage of the library functions: Reading/parsing PDF-Files which contain strings with escape sequences, e.g. (\r) or (\b), causes problem when writing these strings: the functions PdfVariant::Write() and PdfString::Write() yield a strange output - that is: (\r) becomes ( ) and (\b) becomes ) respectively. That means, the escape sequences are resolved in the output to a CR or BACKSPACE instead of maintaining the escaping with the Backslash "\". Due to this behavior, the written PDFs are corrupt (esp. due to malformed syntax produced by the \b). Is there a special function or flag to find a work-around for this behavior? I could not fix the problem with PoDoFo functions but rather had use PdfVariant::ToString() and rewrite the std::string manually to hex codes... Thank you for your help! Greetings, Amin PS: The strange encodings with low code numbers occur in a PDF where a Font Encoding remaps all present characters by a "Differences"-Mapping to codes starting at 1 - i.e. the non-printable chars will be mapped to printable characters by this encoding (cf. Pdf Spec Sect 9.6.6 Character Encoding). -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] PdfContentsTokenizer position is reset with multiple streams
Hi Mike, If you change the behavior of PdfContentsTokenizer::GetNextToken() to span across streams, could you please provide a flag to toggle this behavior? For some users (like me) it might be important to change back to the "old" behavior which DOES NOT span across streams. I have got an application which parses through streams and replaces the content of each single stream without changing the overall structure of the streams. I think that this wouldn't be possible any longer if PdfContentsTokenizer::GetNextToken() did not detect stream boundaries anymore. Thanks in advance! Greetings, Amin On 26.08.2009, at 17:17, Mike Slegeir wrote: > I've discovered another related issue. PdfTokenizer is unable to > reach into the next content stream in order to get a token. So any > objects which are split across Contents have an UnexpectedEOF > raised. My suggested solution to the problem is to either > concatenate all the Content streams before doing any tokenization or > to make PdfTokenizer::GetNextToken virtual and move the stream > switching logic into PdfContentsTokenizer::GetNextToken such that it > will try the parents version, attempt to move to the next stream (if > it exists) on failure, then retry. Attached is a very basic example > of an array split between two streams. > > - Mike Slegeir > > Mike Slegeir wrote: >> I've resolved this issue in an admittedly hacky way. This may be >> sufficient for this problem though. Attached is a patch which >> fixes the issue. I've only done limited testing, but it does at >> least correct the issue. >> >> - Mike Slegeir >> >> >>> When using PdfContentsTokenizer with a PDF with an array for >>> Contents >>> rather than a single stream, the tokenizer will reset its position >>> to >>> the beginning of the first stream upon exhausting a stream. An >>> Contents >>> array with contents X Y Z will appear as X X Y X Y Z to a user of >>> the >>> PdfContentsTokenizer. Attached is a PDF which has a Contents >>> array. I >>> can provide example code and output if necessary. >>> > array > .pdf > > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 > 30-Day > trial. Simplify your report design, integration and deployment - and > focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. > http://p.sf.net/sfu/bobj-july___ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] PdfContentsTokenizer stopped working
Hi, PdfContentsTokenizer in recent PoDoFo versions (e.g. rev 1132) stopped working. I have created a minimal source which shows the problem: The ReadNext() function immediately returns FALSE and the while-loop creates no output at all. Up to rev. 1069 the output was as expected: v v k v v v v k v k Here is the source to reproduce the behavior: #include "podofo.h" #include int main(int argc, char **argv) { std::string str= "/Layer /MC0 BDC\n" // variant variant keyword "0.543 0 0.937 0 k\n" // variant variant variant variant keywork "/GS0 gs\n"; // variant keyword const char *buf=str.c_str(); const PoDoFo::pdf_long buflen=str.size(); // for older lib-versions use pdf_long instead of PoDoFo::pdf_long PoDoFo::PdfContentsTokenizer tokenizer(buf, buflen); PoDoFo::EPdfContentsType type; const char* keyword=NULL; PoDoFo::PdfVariant variant; while (tokenizer.ReadNext(type, keyword, variant)) { switch (type) { case PoDoFo::ePdfContentsType_Keyword: std::clog << "k" << std::endl; break; case PoDoFo::ePdfContentsType_Variant: std::clog << "v "; break; } } return(0); } What is going wrong? Unfortunately, I cannot find the exact svn- revision where it stopped working: - as mentioned before, it worked with 1069 - between rev. 1070 and 1110 the library does not compile on my system (Mac OS X) - rev. 1120 builds fine and already produces no output Thank you very much, Amin PS: Strangely, the use of the other constructor PdfContentsTokenizer( PdfCanvas* pCanvas ) seems to work: It is called by TextExtractor::ExtractText in tools/podofotxtextract. And this tool produces output on my system. smime.p7s Description: S/MIME cryptographic signature -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
[Podofo-users] Podofobrowser treats Real as Integer
Hi, I have the following problem with podofobrowser (SVN Rev. 968): Real numbers are treated like Integers (the fraction is always .0). Still, the type in the tree is correctly depicted as Real. Maybe the problem only occurs in conjunction with arrays. Example: If you open a file input.pdf which contains this data... 7 0 obj << /Type /Page /ArtBox [ 27.673200 27.673200 566.256000 651.295000 ] /BleedBox [ 13.50 13.50 580.429000 665.469000 ] /Contents 54 0 R /CropBox [ 0.00 0.00 593.929000 678.969000 ] /MediaBox [ 0.00 0.00 593.929000 678.969000 ] ... then saving this file (without editing) to output.pdf yields (the same data as displayed in the browser): 7 0 obj << /Type /Page /ArtBox [ 27.00 27.00 566.00 651.00 ] /BleedBox [ 13.00 13.00 580.00 665.00 ] /Contents 54 0 R /CropBox [ 0.00 0.00 593.00 678.00 ] /MediaBox [ 0.00 0.00 593.00 678.00 ] Does anyone know how to fix this behaviour? Is it a problem of the podofobrowser itself or of the included podofoversion (in externals/ required_libpodofo)? Thank you very much in advance! Regards, Amin -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users