Hi!
 
Try opening the Pdf with GhostScript. It also repairs and displays slightly
broken Pdf's, but unlike Acrobat it does not do it silently, but gives a
message with the cause.
 
Uli

  _____  

Von: Dennis Jenkins [mailto:dennis.jenkins...@gmail.com] 
Gesendet: Mittwoch, 7. Mai 2014 07:59
An: podofo-users@lists.sourceforge.net
Betreff: [Podofo-users] Splicing PDFs with AcroForms,
NeedsAppearances,mysterious file size shrinkage, Adobe Reader behavior


Hello all (but mostly directed to Leonard),


   A few days ago I described [1] some odd behavior that I am having with
Adobe Reader consuming PDFs generated by my project.  To avoid hijacking
Christophe's original thread, I am starting a new one.


   At a high-level, my goal is to use PoDoFo to splice together pages from
various PDFs which are US tax forms, fill in the data, save the resulting
PDF and have the filled-in form fields "just work" in Adobe Reader (eg, be
visible and still editable) and have Adobe Reader NOT prompt the user to
save the file when the user attempts to exit.  Secondly, I noticed that if I
allow Adobe Reader to save the PDF, it shrinks in half (sometimes).  I want
to know why, so that I can optimize the size of my PDFs without needing
Adobe Reader (my code runs on Linux as part of a web service).


   Leonard suggested that my PDF is malformed and that Adobe Reader is
offering to repair/save it in this case.  After much experimentation and
staring at "podofobrowser" and "podofopdfinfo diffs" of the pre- and post-
PDFs, I am not 100% convinced that this is the case.


  In my code, I must set the "NeedsApperances" dictionary element of the
"/AcroForm" to "true", or my fields will not be visible in Adobe Reader.  I
then need to populate the appearance stream, per section 12.7.3.3 of ISO
32000:2008 (herein referred to as "the spec").  When Adobe Reader saves my
PDF, this dictionary key disappears, and every field element gains a key
called "AP", with a child key of "N".  This is discussed in 12.7.3.3 of the
spec on page #435, first complete paragraph.


  If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe
Reader will no longer offer to save my PDF, but my field values are no
longer visible.  Therefore, I suspect that Adobe wants to save the PDF in
order to apply/generate the per-field appearance stream.


QUESTION 1: Is the above hypothesis valid?


  I generate my PDFs by creating an empty PDF in memory, and "inserting"
pages from other PDFs.  This results in a PDF with no "Fields" in the
"/AcroForm/Fields" array.  Adobe Reader populates the "Fields" array when it
saves the PDF.  However, the count of elements in the "Fields" array does
not match the actual count of fields.  For example, Adobe Reader places 176
elements into this array, but when I enumerate all fields on all pages using
the PoDoFo API (with my patch to handle inherited fields), I count 212.  I
have not completed an exhaustive comparison of the "Fields" arrays yet to
determine if the discrepancy is due to the inherited form fields (typically
check boxes) or not.  I wrote a routine to populate the "Fields" array
myself (with all 212 items), but Adobe Reader rebuilds it with on 176 items.
If I do not set the "NeedsApperances" flag, Adobe Reader never offers to
save the PDF on exit, so this array is not rebuilt in this case.


QUESTION 2: How does Adobe Reader determine which fields need to be in the
"/AcroForm/Fields" array?


    Adobe Reader seems to not care that the "/AcroForm" is missing (its
presence or absence does not affect when Adobe Reader offers to save the
form).  Yet section 12.7.2 of the spec states that the "/AcroForm" is
required.


QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's behavior?
Which is "correct" (or did I misunderstand the ISO)?


    The content of the "Fields -> element -> AP -> N" key is an "/XObject".
The data stream created by Adobe Reader for it looks complicated.


QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any
suggestions on how I can compute the required XObject in code?  I just want
to check a checkbox or place simple text into a text field.


    When Adobe Reader does save the PDF, and depending on which source
form(s) are in it, the resulting PDF might shrink in size considerably.  A
cursory look with podofobrowser shows that Adobe Reader has heavily modified
"Pages -> Kids[page] -> Contents[]".  In my current testing PDF, the
original has one element in page #0 Contents, with a compressed length of
20443.  Adobe Reader's version has 8 array elements, each with approximately
2K of compressed XObject data.


QUESTION 5:  Why does Adobe Reader tinker with this part of a PDF when
saving it?  Ok, that was rhetorical - I assume that it does so so the the
file will be smaller, and it also sets the "linearized" flag.  The question
should be stated: What rules does Adobe Reader follow when deciding if/how
to refactor the actual page layout.


QUESTION 6: Why does refactoring the XObject components make the file so
much smaller (200K vs 450K for example).


   In some cases, the file size savings are significant.  If I knew what
rules Adobe Reader followed, I might attempt to write a routine to apply the
same changes using PoDoFo (and share it with the community).


   Thank you for your time.

[1] http://sourceforge.net/p/podofo/mailman/message/32302847/


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to