Re: [Podofo-users] Parse error on Chromium generated PDFs
After some digging, Chromium / Skia do not store annotations as references, but directly as dictionnary under /Annots. Here is a simple patch which solves it but incorrect : the allocated PdfObject is not deleted afterwards. - Mail original - De: le...@free.fr À: podofo-users@lists.sourceforge.net Envoyé: Mercredi 6 Décembre 2017 10:01:58 Objet: [Podofo-users] Parse error on Chromium generated PDFs Hello, Thanks for this library. I try to use it to postprocess Chromium generated PDFs (add of outlines) but a parsing error occurs on them. Here is an exemple output of podofopdfinfo : Page Info - Page Count: 1 Page 0: ->Internal Number:1 ->Object Number:5 0 R MediaBox: [ 0.00 0.00 842.00 1191.00 ] Rotation: 0 # of Annotations: 1 Error: An error 20 ocurred during uncompressing the pdf file. PoDoFo encountered an error. Error: 20 ePdfError_InvalidDataType Callstack: #0 Error Source: ../src/base/PdfVariant.h:883 Steps to reproduce : Open Chrome or Chromium Go to http://example.com/ Print the page to a PDF Execute podofopdfinfo on it Or on command line $ chromium-browser 'data:text/html,http://example.com";>Hello World' --headless --print-to-pdf=test.pdf $ podofopdfinfo -P test.pdf More specifically, the problem occurs when the page contain at least a link. I can reproduce it on a custom build of the trunk and the 0.9.3 packaged on Ubuntu 16.04. Best regards, Leizh -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users Index: src/doc/PdfPage.cpp === --- src/doc/PdfPage.cpp (révision 1861) +++ src/doc/PdfPage.cpp (copie de travail) @@ -369,21 +369,27 @@ PODOFO_RAISE_ERROR( ePdfError_ValueOutOfRange ); } -ref= pObj->GetArray()[index].GetReference(); -pAnnot = m_mapAnnotations[ref]; -if( !pAnnot ) -{ -pObj = this->GetObject()->GetOwner()->GetObject( ref ); -if( !pObj ) -{ -PdfError::DebugMessage( "Error looking up object %i %i R\n", ref.ObjectNumber(), ref.GenerationNumber() ); -PODOFO_RAISE_ERROR( ePdfError_NoObject ); -} - -pAnnot = new PdfAnnotation( pObj, this ); -m_mapAnnotations[ref] = pAnnot; -} +PdfObject variant = pObj->GetArray()[index]; + if (variant.IsDictionary()) { +pAnnot = new PdfAnnotation(new PdfObject(variant), this ); + } else { + ref= variant.GetReference(); + pAnnot = m_mapAnnotations[ref]; + if( !pAnnot ) + { + pObj = this->GetObject()->GetOwner()->GetObject( ref ); + if( !pObj ) + { + PdfError::DebugMessage( "Error looking up object %i %i R\n", ref.ObjectNumber(), ref.GenerationNumber() ); + PODOFO_RAISE_ERROR( ePdfError_NoObject ); + } + + pAnnot = new PdfAnnotation( pObj, this ); + m_mapAnnotations[ref] = pAnnot; + } + } + return pAnnot; } -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Parse error on Chromium generated PDFs
On Wed, 2017-12-06 at 18:34 +0100, le...@free.fr wrote: > After some digging, Chromium / Skia do not store annotations as > references, but directly as dictionnary under /Annots. > Here is a simple patch which solves it but incorrect : the allocated > PdfObject is not deleted afterwards. Hi, thanks for the notice and the patch. You are right it is not complete, the pAnnot is also leaking and couple more issues had been there. I extended your change in revision 1867: https://sourceforge.net/p/podofo/code/1867 Bye, zyx -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Parse error on Chromium generated PDFs
Hello zyx, hello all, > zyx has written on 14 January 2018 at 15:42: > > > On Wed, 2017-12-06 at 18:34 +0100, le...@free.fr wrote: > > After some digging, Chromium / Skia do not store annotations as > > references, but directly as dictionnary under /Annots. > > Here is a simple patch which solves it but incorrect : the allocated > > PdfObject is not deleted afterwards. > > Hi, > thanks for the notice and the patch. You are right it is not complete, > the pAnnot is also leaking and couple more issues had been there. > I extended your change in revision 1867: > https://sourceforge.net/p/podofo/code/1867 in that revision in src/doc/PdfPage.cpp:380 (i.e. line 380) there is a typo: it should be pItem instead of pObj (as in line 384 the index pItem is used for storing, so it should be used for querying also). The object pObj is an array, you didn't want that as a map key, right? An issue is also that you replaced the construct (*iterator).member by iterator->member which is AFAIK non-standard, so (possibly at least) non-portable. This is because iterator types don't have to be pointers and aren't (otherwise) required to have an operator-> but only (see [1]) operator*() const for dereferencing the iterator. So please change it back or please give me a technical reason why that isn't practical for you. > > Bye, > zyx > [1] http://en.cppreference.com/w/cpp/concept/Iterator Example implementation in http://en.cppreference.com/w/cpp/iterator/iterator Best regards, mabri -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users
Re: [Podofo-users] Parse error on Chromium generated PDFs
On Wed, 2018-01-17 at 00:48 +0100, Matthew Brincke wrote: > in that revision in src/doc/PdfPage.cpp:380 (i.e. line 380) there is > a typo: it should be pItem instead of pObj (as in line 384 the index > pItem is used for storing, so it should be used for querying also). Hi, thanks for the review, that is a really silly typo. I'm sorry about that. > An issue is also that you replaced the construct (*iterator).member > by iterator->member which is AFAIK non-standard, so (possibly at > least) non-portable. Well, I've been told more than 20 years ago that one can use either "(*a)." or "a->", because they are equivalent notations, which [2] confirms for built-in types. I understand it that even if the overload operator (the "overload" word is an important bit of the information) is not defined, then it will still produce the same result. It's more complicated when the iterator references a pointer, not a structure, but that's far from this thread/paragraph. Nonetheless, I do not care that much, it's just my habit, thus I changed that to "(*a)." notation. Both done in revision 1868: http://sourceforge.net/p/podofo/code/1868 Thanks again and bye, zyx [2] http://en.cppreference.com/w/cpp/language/operator_member_access#Built-in_member_access_operators -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users