Thank you for your answers. @Dennis Jenkins : I use Macintosh but indeed when I save the pdf file of 1Go with The native application on Mac, it shrinks back to 10Mo, the « normal » size. I will install Podofo browser and study it.
@zyx : I will study what you explained me. As soon as I find an answer to my problem, I will share it to you. Christophe Le 7 mai 2014 à 07:59, podofo-users-requ...@lists.sourceforge.net a écrit : > Send Podofo-users mailing list submissions to > podofo-users@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/podofo-users > or, via email, send a message with subject or body 'help' to > podofo-users-requ...@lists.sourceforge.net > > You can reach the person managing the list at > podofo-users-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Podofo-users digest..." > > > Today's Topics: > > 1. Re: How to reduce Pdf size (zyx) > 2. PdfWriter as a base class (Ilan Zisser) > 3. Re: How to reduce Pdf size (Leonard Rosenthol) > 4. Re: How to reduce Pdf size (Leonard Rosenthol) > 5. Splicing PDFs with AcroForms, NeedsAppearances, mysterious > file size shrinkage, Adobe Reader behavior (Dennis Jenkins) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 05 May 2014 09:07:19 +0200 > From: zyx <z...@litepdf.cz> > Subject: Re: [Podofo-users] How to reduce Pdf size > To: podofo-users@lists.sourceforge.net > Message-ID: <1399273639.1876.9.camel@zyxPad> > Content-Type: text/plain; charset="UTF-8" > > On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote: >> I am developping a simple software. As a first basis, I would like to >> duplicate a pdf file that has been created by my printer (a scanned >> document). It consists of a file of 40 pages. It weights 10 Mb and >> each page of the pdf document is an image of a scanned document?s >> page. >> >> In my program, I am just copying each page from the original pdf (I >> load it in a PdfMemDocument) and then inserts it in another >> PdfMemDocument with InsertPage. >> >> I just do a Write() at the end. >> >> The file created at the end weighs more than 500 MB!! > > Hi, > check the documentation and comments around the functions you use for > the page insertion. The PoDoFo doesn't merge resources, thus whenever > you add single page to a new document it copies whole document (or > "only" all resources, I dot recall precisely) to the new file, thus > nothing is missing when the inserted page is drawn. > > I suggest to copy all pages in the destination document at once, convert > each into an XObject, then delete them all and reorder them as you wish, > by drawing the XObject into the new page (you can even shrink it and so > on). This way you'll not duplicate the resources (by the way, inner > images are also resources, which explains the size increase). > Bye, > zyx > > -- > http://www.litePDF.cz i...@litepdf.cz > > > > > ------------------------------ > > Message: 2 > Date: Mon, 05 May 2014 10:20:39 +0300 > From: Ilan Zisser <ilanzis...@gmail.com> > Subject: [Podofo-users] PdfWriter as a base class > To: podofo-users@lists.sourceforge.net > Message-ID: <53673bc7.4070...@gmail.com> > Content-Type: text/plain; charset="windows-1255" > > An HTML attachment was scrubbed... > -------------- next part -------------- > Index: PdfWriter.h > =================================================================== > --- PdfWriter.h (revision 1598) > +++ PdfWriter.h (working copy) > @@ -100,7 +100,7 @@ > * > * \param pDevice write to the specified device > */ > - void Write( PdfOutputDevice* pDevice ); > + virtual void Write( PdfOutputDevice* pDevice ); > > /** Set the write mode to use when writing the PDF. > * \param eWriteMode write mode > @@ -192,7 +192,7 @@ > * \param bPrevEntry if true a prev entry is added to the trailer object > with a value of 0 > * \param bOnlySizeKey write only the size key > */ > - void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, bool > bPrevEntry, bool bOnlySizeKey ) const; > + virtual void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, > bool bPrevEntry, bool bOnlySizeKey ) const; > > protected: > /** > @@ -202,15 +202,16 @@ > > /** Writes the pdf header to the current file. > * \param pDevice write to this output device > - */ > - void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice ); > + */ > > + virtual void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice ); > + > /** Write pdf objects to file > * \param pDevice write to this output device > * \param vecObjects write all objects in this vector to the file > * \param pXref add all written objects to this XRefTable > - */ > - void WritePdfObjects( PdfOutputDevice* pDevice, const PdfVecObjects& > vecObjects, PdfXRef* pXref ) PODOFO_LOCAL; > + */ > + virtual void WritePdfObjects( PdfOutputDevice* pDevice, const > PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL; > > /** Creates a file identifier which is required in several > * PDF workflows. > > ------------------------------ > > Message: 3 > Date: Mon, 5 May 2014 11:32:34 +0000 > From: Leonard Rosenthol <lrose...@adobe.com> > Subject: Re: [Podofo-users] How to reduce Pdf size > To: Dennis Jenkins <dennis.jenkins...@gmail.com>, > "podofo-users@lists.sourceforge.net" > <podofo-users@lists.sourceforge.net> > Message-ID: <cf8cee9e.5aae6%lrose...@adobe.com> > Content-Type: text/plain; charset="iso-8859-1" > > That message from Reader means that the file is damaged in some way so that > Reader had to repair it when it opened it. Something you are doing in the > editing/modification process is creating an invalid PDF. And yes, in that > case, it does a (full) save. > > Leonard > > From: Dennis Jenkins > <dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>> > Date: Monday, May 5, 2014 at 12:03 AM > To: > "podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>" > > <podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>> > Subject: Re: [Podofo-users] How to reduce Pdf size > > > > > On Sun, May 4, 2014 at 5:06 PM, Leonard Rosenthol > <lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote: > Adobe Reader doesn't re-save PDFs - so perhaps you mean Adobe Acrobat?? > > Leonard > > > Hello Leonard, > > I do mean "Adobe Reader XI" on 32-bit Windows XP. I'm not editing the PDF, > exactly. Let me provide an example. > > The IRS provides a four-page PDF for the "941" report, and a second report > (addenda) called the "Schedule B". Only two pages of the 941 have actual > "pdf form" data. > > My program will create a new (empty) PDF, open the 941, splice in two of > the four pages, splice in the Schedule B (if needed), and then fill in the > form fields with the proper data. I must also embed another font and create > an appearance stream (you helped me with this logic a few years ago). The > software will then save the PDF. > > If I open this PDF in Adobe Reader, it looked correct (form fields are > filled in). However, if I attempt to exit/close Adobe Reader, it prompts me > "Do you want to save changes to XXX.pdf before closing?" (even if I changed > nothing while Adobe Reader was open). If I decline, then Adobe Reader exits > and nothing special happens. If I elect to "save my changes", then the > resulting PDF on disk is smaller then the original, a new top-level section > called "/Metadata" is created, and the "/Acroform" is altered. I have yet to > determine what gets removed from the PDF that makes it smaller, but I suspect > that it is the font that I had to add earlier. If I don't add that font, > then the fields that I filled in are not visible in Adobe Reader unless the > individual field is selected by the user (input focus). > > I can repeat the above with the other forms that my software will populate > (Arizona A1-QRT and Arizona UC-018). > > (Federal 941 report, file size difference is not much) > $ ls -l ./tmp/report*.pdf > -rw-r--r-- 1 djenkins djenkins 654315 May 4 22:59 ./tmp/report.pdf > -rw-r--r-- 1 djenkins djenkins 606551 May 4 23:00 ./tmp/report2.pdf > > (AZ UC-018 report, size difference is significant) > $ ls -l ./tmp/report*.pdf > -rw-r--r-- 1 djenkins djenkins 415754 May 4 23:01 ./tmp/report.pdf > -rw-r--r-- 1 djenkins djenkins 206989 May 4 23:01 ./tmp/report2.pdf > > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Mon, 5 May 2014 11:34:05 +0000 > From: Leonard Rosenthol <lrose...@adobe.com> > Subject: Re: [Podofo-users] How to reduce Pdf size > To: zyx <z...@litepdf.cz>, "podofo-users@lists.sourceforge.net" > <podofo-users@lists.sourceforge.net> > Message-ID: <cf8cef0d.5aaec%lrose...@adobe.com> > Content-Type: text/plain; charset="iso-8859-1" > > It?s been a long time since I looked at the page copying code in PoDoFo, > but it should only be copying those resources referenced by the page in > question. Of course, if those resources are shared across pages - and you > copy multiple pages - you get multiple copies (since they are no longer > shared when copied page by page). > > Even better than suggested below is to start with the larger document, add > your smaller document to it, and then delete. > > Leonard > > On 5/5/14, 3:07 AM, "zyx" <z...@litepdf.cz> wrote: > >> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote: >>> I am developping a simple software. As a first basis, I would like to >>> duplicate a pdf file that has been created by my printer (a scanned >>> document). It consists of a file of 40 pages. It weights 10 Mb and >>> each page of the pdf document is an image of a scanned document?s >>> page. >>> >>> In my program, I am just copying each page from the original pdf (I >>> load it in a PdfMemDocument) and then inserts it in another >>> PdfMemDocument with InsertPage. >>> >>> I just do a Write() at the end. >>> >>> The file created at the end weighs more than 500 MB!! >> >> Hi, >> check the documentation and comments around the functions you use for >> the page insertion. The PoDoFo doesn't merge resources, thus whenever >> you add single page to a new document it copies whole document (or >> "only" all resources, I dot recall precisely) to the new file, thus >> nothing is missing when the inserted page is drawn. >> >> I suggest to copy all pages in the destination document at once, convert >> each into an XObject, then delete them all and reorder them as you wish, >> by drawing the XObject into the new page (you can even shrink it and so >> on). This way you'll not duplicate the resources (by the way, inner >> images are also resources, which explains the size increase). >> Bye, >> zyx >> >> -- >> http://www.litePDF.cz i...@litepdf.cz >> >> >> -------------------------------------------------------------------------- >> ---- >> Is your legacy SCM system holding you back? Join Perforce May 7 to find >> out: >> • 3 signs your SCM is hindering your productivity >> • Requirements for releasing software faster >> • Expert tips and advice for migrating your SCM now >> http://p.sf.net/sfu/perforce >> _______________________________________________ >> Podofo-users mailing list >> Podofo-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/podofo-users > > > > > ------------------------------ > > Message: 5 > Date: Wed, 7 May 2014 00:59:13 -0500 > From: Dennis Jenkins <dennis.jenkins...@gmail.com> > Subject: [Podofo-users] Splicing PDFs with AcroForms, > NeedsAppearances, mysterious file size shrinkage, Adobe Reader > behavior > To: "podofo-users@lists.sourceforge.net" > <podofo-users@lists.sourceforge.net> > Message-ID: > <CAAEzAp9Rfd1=zQeaja7m8VNz68++cpwWicokHVwhRL=sclp...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hello all (but mostly directed to Leonard), > > A few days ago I described [1] some odd behavior that I am having with > Adobe Reader consuming PDFs generated by my project. To avoid hijacking > Christophe's original thread, I am starting a new one. > > At a high-level, my goal is to use PoDoFo to splice together pages from > various PDFs which are US tax forms, fill in the data, save the resulting > PDF and have the filled-in form fields "just work" in Adobe Reader (eg, be > visible and still editable) and have Adobe Reader NOT prompt the user to > save the file when the user attempts to exit. Secondly, I noticed that if > I allow Adobe Reader to save the PDF, it shrinks in half (sometimes). I > want to know why, so that I can optimize the size of my PDFs without > needing Adobe Reader (my code runs on Linux as part of a web service). > > Leonard suggested that my PDF is malformed and that Adobe Reader is > offering to repair/save it in this case. After much experimentation and > staring at "podofobrowser" and "podofopdfinfo diffs" of the pre- and post- > PDFs, I am not 100% convinced that this is the case. > > In my code, I must set the "NeedsApperances" dictionary element of the > "/AcroForm" to "true", or my fields will not be visible in Adobe Reader. I > then need to populate the appearance stream, per section 12.7.3.3 of ISO > 32000:2008 (herein referred to as "the spec"). When Adobe Reader saves my > PDF, this dictionary key disappears, and every field element gains a key > called "AP", with a child key of "N". This is discussed in 12.7.3.3 of the > spec on page #435, first complete paragraph. > > If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe > Reader will no longer offer to save my PDF, but my field values are no > longer visible. Therefore, I suspect that Adobe wants to save the PDF in > order to apply/generate the per-field appearance stream. > > QUESTION 1: Is the above hypothesis valid? > > I generate my PDFs by creating an empty PDF in memory, and "inserting" > pages from other PDFs. This results in a PDF with no "Fields" in the > "/AcroForm/Fields" array. Adobe Reader populates the "Fields" array when > it saves the PDF. However, the count of elements in the "Fields" array > does not match the actual count of fields. For example, Adobe Reader > places 176 elements into this array, but when I enumerate all fields on all > pages using the PoDoFo API (with my patch to handle inherited fields), I > count 212. I have not completed an exhaustive comparison of the "Fields" > arrays yet to determine if the discrepancy is due to the inherited form > fields (typically check boxes) or not. I wrote a routine to populate the > "Fields" array myself (with all 212 items), but Adobe Reader rebuilds it > with on 176 items. If I do not set the "NeedsApperances" flag, Adobe > Reader never offers to save the PDF on exit, so this array is not rebuilt > in this case. > > QUESTION 2: How does Adobe Reader determine which fields need to be in the > "/AcroForm/Fields" array? > > Adobe Reader seems to not care that the "/AcroForm" is missing (its > presence or absence does not affect when Adobe Reader offers to save the > form). Yet section 12.7.2 of the spec states that the "/AcroForm" is > required. > > QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's > behavior? Which is "correct" (or did I misunderstand the ISO)? > > The content of the "Fields -> element -> AP -> N" key is an > "/XObject". The data stream created by Adobe Reader for it looks > complicated. > > QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any > suggestions on how I can compute the required XObject in code? I just want > to check a checkbox or place simple text into a text field. > > When Adobe Reader does save the PDF, and depending on which source > form(s) are in it, the resulting PDF might shrink in size considerably. A > cursory look with podofobrowser shows that Adobe Reader has heavily > modified "Pages -> Kids[page] -> Contents[]". In my current testing PDF, > the original has one element in page #0 Contents, with a compressed length > of 20443. Adobe Reader's version has 8 array elements, each with > approximately 2K of compressed XObject data. > > QUESTION 5: Why does Adobe Reader tinker with this part of a PDF when > saving it? Ok, that was rhetorical - I assume that it does so so the the > file will be smaller, and it also sets the "linearized" flag. The question > should be stated: What rules does Adobe Reader follow when deciding if/how > to refactor the actual page layout. > > QUESTION 6: Why does refactoring the XObject components make the file so > much smaller (200K vs 450K for example). > > In some cases, the file size savings are significant. If I knew what > rules Adobe Reader followed, I might attempt to write a routine to apply > the same changes using PoDoFo (and share it with the community). > > Thank you for your time. > > [1] http://sourceforge.net/p/podofo/mailman/message/32302847/ > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce > > ------------------------------ > > _______________________________________________ > Podofo-users mailing list > Podofo-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/podofo-users > > > End of Podofo-users Digest, Vol 95, Issue 3 > ******************************************* ------------------------------------------------------------------------------ Is your legacy SCM system holding you back? Join Perforce May 7 to find out: • 3 signs your SCM is hindering your productivity • Requirements for releasing software faster • Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce _______________________________________________ Podofo-users mailing list Podofo-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/podofo-users