Re: [Podofo-users] Splicing PDFs with AcroForms, NeedsAppearances, mysterious file size shrinkage, Adobe Reader behavior

Leonard Rosenthol Wed, 07 May 2014 05:43:55 -0700

[For completeness, all references to Reader behavior also apply to Acrobat]


You didn’t mention the presence of NeedsAppearance during the original thread 
or I would have pointed that out at the time.  So YES, because that is present, 
Reader has to create all the appearances (as you’ve asked it to) and then the 
file requires the Save.  If you don’t wish that to happen, then you will need 
to generate all the appearances yourself (via PoDoFo) so that Reader doesn’t 
have to do so.  And yes, Appearances can be complicated – it’s all the drawing 
instructions necessary to render the text/paths/images that make a field look 
like a field PLUS the data/value associated with it.

If the Fields array isn’t getting updated as part of the page insertion, that 
is a bug/limitation of PoDoFo’s page insertion code.  You will need to 
update/fix that in order for proper form field copying to work.   Rebuilding it 
after the fact is NOT the correct way to do it (for a variety of reasons) - you 
need to take the data directly from the source PDFs at merge/insert time.

If there are Annotations on a page that are of type Widget (aka a form field) 
but the field is not present in the AcroForm dictionary, then Reader will add 
it to it’s own list – since it’s determined that the PDF is broken/incorrect 
BUT it figures that user wants to do something useful with it.   Add to that 
the NeedsAppearance and we also have to build all of those.  These combine to 
force Reader to do LOTS of (unnecessary) work.

When Reader has to do a “Full Save”, it performs a LOT of operations to create 
an clean, healthy, optimized PDFs.  You can find the list of SOME of the 
various things it does in the Acrobat SDK documentation concerning the various 
flags that can be passed to the PDDocSave API call.

Leonard

From: Dennis Jenkins 
<dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>>
Date: Wednesday, May 7, 2014 at 1:59 AM
To: 
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>" 
<podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>>
Subject: [Podofo-users] Splicing PDFs with AcroForms, NeedsAppearances, 
mysterious file size shrinkage, Adobe Reader behavior

Hello all (but mostly directed to Leonard),

   A few days ago I described [1] some odd behavior that I am having with Adobe 
Reader consuming PDFs generated by my project.  To avoid hijacking Christophe's 
original thread, I am starting a new one.

   At a high-level, my goal is to use PoDoFo to splice together pages from 
various PDFs which are US tax forms, fill in the data, save the resulting PDF 
and have the filled-in form fields "just work" in Adobe Reader (eg, be visible 
and still editable) and have Adobe Reader NOT prompt the user to save the file 
when the user attempts to exit.  Secondly, I noticed that if I allow Adobe 
Reader to save the PDF, it shrinks in half (sometimes).  I want to know why, so 
that I can optimize the size of my PDFs without needing Adobe Reader (my code 
runs on Linux as part of a web service).

   Leonard suggested that my PDF is malformed and that Adobe Reader is offering 
to repair/save it in this case.  After much experimentation and staring at 
"podofobrowser" and "podofopdfinfo diffs" of the pre- and post- PDFs, I am not 
100% convinced that this is the case.

  In my code, I must set the "NeedsApperances" dictionary element of the 
"/AcroForm" to "true", or my fields will not be visible in Adobe Reader.  I 
then need to populate the appearance stream, per section 12.7.3.3 of ISO 
32000:2008 (herein referred to as "the spec").  When Adobe Reader saves my PDF, 
this dictionary key disappears, and every field element gains a key called 
"AP", with a child key of "N".  This is discussed in 12.7.3.3 of the spec on 
page #435, first complete paragraph.

  If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe Reader 
will no longer offer to save my PDF, but my field values are no longer visible. 
 Therefore, I suspect that Adobe wants to save the PDF in order to 
apply/generate the per-field appearance stream.

QUESTION 1: Is the above hypothesis valid?

  I generate my PDFs by creating an empty PDF in memory, and "inserting" pages 
from other PDFs.  This results in a PDF with no "Fields" in the 
"/AcroForm/Fields" array.  Adobe Reader populates the "Fields" array when it 
saves the PDF.  However, the count of elements in the "Fields" array does not 
match the actual count of fields.  For example, Adobe Reader places 176 
elements into this array, but when I enumerate all fields on all pages using 
the PoDoFo API (with my patch to handle inherited fields), I count 212.  I have 
not completed an exhaustive comparison of the "Fields" arrays yet to determine 
if the discrepancy is due to the inherited form fields (typically check boxes) 
or not.  I wrote a routine to populate the "Fields" array myself (with all 212 
items), but Adobe Reader rebuilds it with on 176 items.  If I do not set the 
"NeedsApperances" flag, Adobe Reader never offers to save the PDF on exit, so 
this array is not rebuilt in this case.

QUESTION 2: How does Adobe Reader determine which fields need to be in the 
"/AcroForm/Fields" array?

    Adobe Reader seems to not care that the "/AcroForm" is missing (its 
presence or absence does not affect when Adobe Reader offers to save the form). 
 Yet section 12.7.2 of the spec states that the "/AcroForm" is required.

QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's behavior?  
Which is "correct" (or did I misunderstand the ISO)?

    The content of the "Fields -> element -> AP -> N" key is an "/XObject".  
The data stream created by Adobe Reader for it looks complicated.

QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any 
suggestions on how I can compute the required XObject in code?  I just want to 
check a checkbox or place simple text into a text field.

    When Adobe Reader does save the PDF, and depending on which source form(s) 
are in it, the resulting PDF might shrink in size considerably.  A cursory look 
with podofobrowser shows that Adobe Reader has heavily modified "Pages -> 
Kids[page] -> Contents[]".  In my current testing PDF, the original has one 
element in page #0 Contents, with a compressed length of 20443.  Adobe Reader's 
version has 8 array elements, each with approximately 2K of compressed XObject 
data.

QUESTION 5:  Why does Adobe Reader tinker with this part of a PDF when saving 
it?  Ok, that was rhetorical - I assume that it does so so the the file will be 
smaller, and it also sets the "linearized" flag.  The question should be 
stated: What rules does Adobe Reader follow when deciding if/how to refactor 
the actual page layout.

QUESTION 6: Why does refactoring the XObject components make the file so much 
smaller (200K vs 450K for example).

   In some cases, the file size savings are significant.  If I knew what rules 
Adobe Reader followed, I might attempt to write a routine to apply the same 
changes using PoDoFo (and share it with the community).

   Thank you for your time.

[1] http://sourceforge.net/p/podofo/mailman/message/32302847/

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce

_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] Splicing PDFs with AcroForms, NeedsAppearances, mysterious file size shrinkage, Adobe Reader behavior

Reply via email to