Thank you for your answers. 

@Dennis Jenkins : I use Macintosh but indeed when I save the pdf file of 1Go 
with The native application on Mac, it shrinks back to 10Mo, the « normal » 
size. 
I will install Podofo browser and study it. 

@zyx : I will study what you explained me. 

As soon as I find an answer to my problem, I will share it to you. 

Christophe


Le 7 mai 2014 à 07:59, podofo-users-requ...@lists.sourceforge.net a écrit :

> Send Podofo-users mailing list submissions to
>       podofo-users@lists.sourceforge.net
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.sourceforge.net/lists/listinfo/podofo-users
> or, via email, send a message with subject or body 'help' to
>       podofo-users-requ...@lists.sourceforge.net
> 
> You can reach the person managing the list at
>       podofo-users-ow...@lists.sourceforge.net
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Podofo-users digest..."
> 
> 
> Today's Topics:
> 
>  1. Re: How to reduce Pdf size (zyx)
>  2.  PdfWriter as a base class (Ilan Zisser)
>  3. Re: How to reduce Pdf size (Leonard Rosenthol)
>  4. Re: How to reduce Pdf size (Leonard Rosenthol)
>  5. Splicing PDFs with AcroForms, NeedsAppearances, mysterious
>     file size shrinkage, Adobe Reader behavior (Dennis Jenkins)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 05 May 2014 09:07:19 +0200
> From: zyx <z...@litepdf.cz>
> Subject: Re: [Podofo-users] How to reduce Pdf size
> To: podofo-users@lists.sourceforge.net
> Message-ID: <1399273639.1876.9.camel@zyxPad>
> Content-Type: text/plain; charset="UTF-8"
> 
> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote:
>> I am developping a simple software. As a first basis, I would like to
>> duplicate a pdf file that has been created by my printer (a scanned
>> document). It consists of a file of 40 pages. It weights 10 Mb and
>> each page of the pdf document is an image of a scanned document?s
>> page. 
>> 
>> In my program, I am just copying each page from the original pdf (I
>> load it in a PdfMemDocument) and then inserts it in another
>> PdfMemDocument with InsertPage.
>> 
>> I just do a Write() at the end. 
>> 
>> The file created at the end weighs more than 500 MB!!
> 
>       Hi,
> check the documentation and comments around the functions you use for
> the page insertion. The PoDoFo doesn't merge resources, thus whenever
> you add single page to a new document it copies whole document (or
> "only" all resources, I dot recall precisely) to the new file, thus
> nothing is missing when the inserted page is drawn.
> 
> I suggest to copy all pages in the destination document at once, convert
> each into an XObject, then delete them all and reorder them as you wish,
> by drawing the XObject into the new page (you can even shrink it and so
> on). This way you'll not duplicate the resources (by the way, inner
> images are also resources, which explains the size increase).
>       Bye,
>       zyx
> 
> -- 
> http://www.litePDF.cz                                 i...@litepdf.cz
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 05 May 2014 10:20:39 +0300
> From: Ilan Zisser <ilanzis...@gmail.com>
> Subject: [Podofo-users]  PdfWriter as a base class
> To: podofo-users@lists.sourceforge.net
> Message-ID: <53673bc7.4070...@gmail.com>
> Content-Type: text/plain; charset="windows-1255"
> 
> An HTML attachment was scrubbed...
> -------------- next part --------------
> Index: PdfWriter.h
> ===================================================================
> --- PdfWriter.h       (revision 1598)
> +++ PdfWriter.h       (working copy)
> @@ -100,7 +100,7 @@
>     *
>     *  \param pDevice write to the specified device 
>     */
> -    void Write( PdfOutputDevice* pDevice );
> +    virtual void Write( PdfOutputDevice* pDevice );
> 
>    /** Set the write mode to use when writing the PDF.
>     *  \param eWriteMode write mode
> @@ -192,7 +192,7 @@
>     *  \param bPrevEntry if true a prev entry is added to the trailer object 
> with a value of 0
>     *  \param bOnlySizeKey write only the size key
>     */
> -    void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, bool 
> bPrevEntry, bool bOnlySizeKey ) const;
> +    virtual void FillTrailerObject( PdfObject* pTrailer, pdf_long lSize, 
> bool bPrevEntry, bool bOnlySizeKey ) const;
> 
> protected:
>    /**
> @@ -202,15 +202,16 @@
> 
>    /** Writes the pdf header to the current file.
>     *  \param pDevice write to this output device
> -     */       
> -    void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice );
> +     */
> 
> +    virtual void PODOFO_LOCAL WritePdfHeader( PdfOutputDevice* pDevice );
> +
>    /** Write pdf objects to file
>     *  \param pDevice write to this output device
>     *  \param vecObjects write all objects in this vector to the file
>     *  \param pXref add all written objects to this XRefTable
> -     */ 
> -    void WritePdfObjects( PdfOutputDevice* pDevice, const PdfVecObjects& 
> vecObjects, PdfXRef* pXref ) PODOFO_LOCAL;
> +     */
> +    virtual void WritePdfObjects( PdfOutputDevice* pDevice, const 
> PdfVecObjects& vecObjects, PdfXRef* pXref ) PODOFO_LOCAL;
> 
>    /** Creates a file identifier which is required in several
>     *  PDF workflows. 
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 5 May 2014 11:32:34 +0000
> From: Leonard Rosenthol <lrose...@adobe.com>
> Subject: Re: [Podofo-users] How to reduce Pdf size
> To: Dennis Jenkins <dennis.jenkins...@gmail.com>,
>       "podofo-users@lists.sourceforge.net"
>       <podofo-users@lists.sourceforge.net>
> Message-ID: <cf8cee9e.5aae6%lrose...@adobe.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> That message from Reader means that the file is damaged in some way so that 
> Reader had to repair it when it opened it.  Something you are doing in the 
> editing/modification process is creating an invalid PDF.   And yes, in that 
> case, it does a (full) save.
> 
> Leonard
> 
> From: Dennis Jenkins 
> <dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>>
> Date: Monday, May 5, 2014 at 12:03 AM
> To: 
> "podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
>  
> <podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>>
> Subject: Re: [Podofo-users] How to reduce Pdf size
> 
> 
> 
> 
> On Sun, May 4, 2014 at 5:06 PM, Leonard Rosenthol 
> <lrose...@adobe.com<mailto:lrose...@adobe.com>> wrote:
> Adobe Reader doesn't re-save PDFs - so perhaps you mean Adobe Acrobat??
> 
> Leonard
> 
> 
> Hello Leonard,
> 
>   I do mean "Adobe Reader XI" on 32-bit Windows XP.  I'm not editing the PDF, 
> exactly.  Let me provide an example.
> 
>   The IRS provides a four-page PDF for the "941" report, and a second report 
> (addenda) called the "Schedule B".  Only two pages of the 941 have actual 
> "pdf form" data.
> 
>   My program will create a new (empty) PDF, open the 941, splice in two of 
> the four pages, splice in the Schedule B (if needed), and then fill in the 
> form fields with the proper data.  I must also embed another font and create 
> an appearance stream (you helped me with this logic a few years ago).  The 
> software will then save the PDF.
> 
>   If I open this PDF in Adobe Reader, it looked correct (form fields are 
> filled in).  However, if I attempt to exit/close Adobe Reader, it prompts me 
> "Do you want to save changes to XXX.pdf before closing?" (even if I changed 
> nothing while Adobe Reader was open).  If I decline, then Adobe Reader exits 
> and nothing special happens.  If I elect to "save my changes", then the 
> resulting PDF on disk is smaller then the original, a new top-level section 
> called "/Metadata" is created, and the "/Acroform" is altered.  I have yet to 
> determine what gets removed from the PDF that makes it smaller, but I suspect 
> that it is the font that I had to add earlier.  If I don't add that font, 
> then the fields that I filled in are not visible in Adobe Reader unless the 
> individual field is selected by the user (input focus).
> 
>   I can repeat the above with the other forms that my software will populate 
> (Arizona A1-QRT and Arizona UC-018).
> 
> (Federal 941 report, file size difference is not much)
> $ ls -l ./tmp/report*.pdf
> -rw-r--r-- 1 djenkins djenkins 654315 May  4 22:59 ./tmp/report.pdf
> -rw-r--r-- 1 djenkins djenkins 606551 May  4 23:00 ./tmp/report2.pdf
> 
> (AZ UC-018 report, size difference is significant)
> $ ls -l ./tmp/report*.pdf
> -rw-r--r-- 1 djenkins djenkins 415754 May  4 23:01 ./tmp/report.pdf
> -rw-r--r-- 1 djenkins djenkins 206989 May  4 23:01 ./tmp/report2.pdf
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> Message: 4
> Date: Mon, 5 May 2014 11:34:05 +0000
> From: Leonard Rosenthol <lrose...@adobe.com>
> Subject: Re: [Podofo-users] How to reduce Pdf size
> To: zyx <z...@litepdf.cz>, "podofo-users@lists.sourceforge.net"
>       <podofo-users@lists.sourceforge.net>
> Message-ID: <cf8cef0d.5aaec%lrose...@adobe.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> It?s been a long time since I looked at the page copying code in PoDoFo,
> but it should only be copying those resources referenced by the page in
> question.  Of course, if those resources are shared across pages - and you
> copy multiple pages - you get multiple copies (since they are no longer
> shared when copied page by page).
> 
> Even better than suggested below is to start with the larger document, add
> your smaller document to it, and then delete.
> 
> Leonard
> 
> On 5/5/14, 3:07 AM, "zyx" <z...@litepdf.cz> wrote:
> 
>> On Sun, 2014-05-04 at 14:13 +0200, Christophe Meyer wrote:
>>> I am developping a simple software. As a first basis, I would like to
>>> duplicate a pdf file that has been created by my printer (a scanned
>>> document). It consists of a file of 40 pages. It weights 10 Mb and
>>> each page of the pdf document is an image of a scanned document?s
>>> page. 
>>> 
>>> In my program, I am just copying each page from the original pdf (I
>>> load it in a PdfMemDocument) and then inserts it in another
>>> PdfMemDocument with InsertPage.
>>> 
>>> I just do a Write() at the end.
>>> 
>>> The file created at the end weighs more than 500 MB!!
>> 
>>      Hi,
>> check the documentation and comments around the functions you use for
>> the page insertion. The PoDoFo doesn't merge resources, thus whenever
>> you add single page to a new document it copies whole document (or
>> "only" all resources, I dot recall precisely) to the new file, thus
>> nothing is missing when the inserted page is drawn.
>> 
>> I suggest to copy all pages in the destination document at once, convert
>> each into an XObject, then delete them all and reorder them as you wish,
>> by drawing the XObject into the new page (you can even shrink it and so
>> on). This way you'll not duplicate the resources (by the way, inner
>> images are also resources, which explains the size increase).
>>      Bye,
>>      zyx
>> 
>> -- 
>> http://www.litePDF.cz                                 i...@litepdf.cz
>> 
>> 
>> --------------------------------------------------------------------------
>> ----
>> Is your legacy SCM system holding you back? Join Perforce May 7 to find
>> out:
>> &#149; 3 signs your SCM is hindering your productivity
>> &#149; Requirements for releasing software faster
>> &#149; Expert tips and advice for migrating your SCM now
>> http://p.sf.net/sfu/perforce
>> _______________________________________________
>> Podofo-users mailing list
>> Podofo-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/podofo-users
> 
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Wed, 7 May 2014 00:59:13 -0500
> From: Dennis Jenkins <dennis.jenkins...@gmail.com>
> Subject: [Podofo-users] Splicing PDFs with AcroForms,
>       NeedsAppearances, mysterious file size shrinkage, Adobe Reader
>       behavior
> To: "podofo-users@lists.sourceforge.net"
>       <podofo-users@lists.sourceforge.net>
> Message-ID:
>       <CAAEzAp9Rfd1=zQeaja7m8VNz68++cpwWicokHVwhRL=sclp...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hello all (but mostly directed to Leonard),
> 
>  A few days ago I described [1] some odd behavior that I am having with
> Adobe Reader consuming PDFs generated by my project.  To avoid hijacking
> Christophe's original thread, I am starting a new one.
> 
>  At a high-level, my goal is to use PoDoFo to splice together pages from
> various PDFs which are US tax forms, fill in the data, save the resulting
> PDF and have the filled-in form fields "just work" in Adobe Reader (eg, be
> visible and still editable) and have Adobe Reader NOT prompt the user to
> save the file when the user attempts to exit.  Secondly, I noticed that if
> I allow Adobe Reader to save the PDF, it shrinks in half (sometimes).  I
> want to know why, so that I can optimize the size of my PDFs without
> needing Adobe Reader (my code runs on Linux as part of a web service).
> 
>  Leonard suggested that my PDF is malformed and that Adobe Reader is
> offering to repair/save it in this case.  After much experimentation and
> staring at "podofobrowser" and "podofopdfinfo diffs" of the pre- and post-
> PDFs, I am not 100% convinced that this is the case.
> 
> In my code, I must set the "NeedsApperances" dictionary element of the
> "/AcroForm" to "true", or my fields will not be visible in Adobe Reader.  I
> then need to populate the appearance stream, per section 12.7.3.3 of ISO
> 32000:2008 (herein referred to as "the spec").  When Adobe Reader saves my
> PDF, this dictionary key disappears, and every field element gains a key
> called "AP", with a child key of "N".  This is discussed in 12.7.3.3 of the
> spec on page #435, first complete paragraph.
> 
> If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe
> Reader will no longer offer to save my PDF, but my field values are no
> longer visible.  Therefore, I suspect that Adobe wants to save the PDF in
> order to apply/generate the per-field appearance stream.
> 
> QUESTION 1: Is the above hypothesis valid?
> 
> I generate my PDFs by creating an empty PDF in memory, and "inserting"
> pages from other PDFs.  This results in a PDF with no "Fields" in the
> "/AcroForm/Fields" array.  Adobe Reader populates the "Fields" array when
> it saves the PDF.  However, the count of elements in the "Fields" array
> does not match the actual count of fields.  For example, Adobe Reader
> places 176 elements into this array, but when I enumerate all fields on all
> pages using the PoDoFo API (with my patch to handle inherited fields), I
> count 212.  I have not completed an exhaustive comparison of the "Fields"
> arrays yet to determine if the discrepancy is due to the inherited form
> fields (typically check boxes) or not.  I wrote a routine to populate the
> "Fields" array myself (with all 212 items), but Adobe Reader rebuilds it
> with on 176 items.  If I do not set the "NeedsApperances" flag, Adobe
> Reader never offers to save the PDF on exit, so this array is not rebuilt
> in this case.
> 
> QUESTION 2: How does Adobe Reader determine which fields need to be in the
> "/AcroForm/Fields" array?
> 
>   Adobe Reader seems to not care that the "/AcroForm" is missing (its
> presence or absence does not affect when Adobe Reader offers to save the
> form).  Yet section 12.7.2 of the spec states that the "/AcroForm" is
> required.
> 
> QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's
> behavior?  Which is "correct" (or did I misunderstand the ISO)?
> 
>   The content of the "Fields -> element -> AP -> N" key is an
> "/XObject".  The data stream created by Adobe Reader for it looks
> complicated.
> 
> QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any
> suggestions on how I can compute the required XObject in code?  I just want
> to check a checkbox or place simple text into a text field.
> 
>   When Adobe Reader does save the PDF, and depending on which source
> form(s) are in it, the resulting PDF might shrink in size considerably.  A
> cursory look with podofobrowser shows that Adobe Reader has heavily
> modified "Pages -> Kids[page] -> Contents[]".  In my current testing PDF,
> the original has one element in page #0 Contents, with a compressed length
> of 20443.  Adobe Reader's version has 8 array elements, each with
> approximately 2K of compressed XObject data.
> 
> QUESTION 5:  Why does Adobe Reader tinker with this part of a PDF when
> saving it?  Ok, that was rhetorical - I assume that it does so so the the
> file will be smaller, and it also sets the "linearized" flag.  The question
> should be stated: What rules does Adobe Reader follow when deciding if/how
> to refactor the actual page layout.
> 
> QUESTION 6: Why does refactoring the XObject components make the file so
> much smaller (200K vs 450K for example).
> 
>  In some cases, the file size savings are significant.  If I knew what
> rules Adobe Reader followed, I might attempt to write a routine to apply
> the same changes using PoDoFo (and share it with the community).
> 
>  Thank you for your time.
> 
> [1] http://sourceforge.net/p/podofo/mailman/message/32302847/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> 
> ------------------------------
> 
> ------------------------------------------------------------------------------
> Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
> &#149; 3 signs your SCM is hindering your productivity
> &#149; Requirements for releasing software faster
> &#149; Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
> 
> ------------------------------
> 
> _______________________________________________
> Podofo-users mailing list
> Podofo-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/podofo-users
> 
> 
> End of Podofo-users Digest, Vol 95, Issue 3
> *******************************************


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to