[For completeness, all references to Reader behavior also apply to Acrobat]
You didn’t mention the presence of NeedsAppearance during the original thread
or I would have pointed that out at the time. So YES, because that is present,
Reader has to create all the appearances (as you’ve asked it to) and then the
file requires the Save. If you don’t wish that to happen, then you will need
to generate all the appearances yourself (via PoDoFo) so that Reader doesn’t
have to do so. And yes, Appearances can be complicated – it’s all the drawing
instructions necessary to render the text/paths/images that make a field look
like a field PLUS the data/value associated with it.
If the Fields array isn’t getting updated as part of the page insertion, that
is a bug/limitation of PoDoFo’s page insertion code. You will need to
update/fix that in order for proper form field copying to work. Rebuilding it
after the fact is NOT the correct way to do it (for a variety of reasons) - you
need to take the data directly from the source PDFs at merge/insert time.
If there are Annotations on a page that are of type Widget (aka a form field)
but the field is not present in the AcroForm dictionary, then Reader will add
it to it’s own list – since it’s determined that the PDF is broken/incorrect
BUT it figures that user wants to do something useful with it. Add to that
the NeedsAppearance and we also have to build all of those. These combine to
force Reader to do LOTS of (unnecessary) work.
When Reader has to do a “Full Save”, it performs a LOT of operations to create
an clean, healthy, optimized PDFs. You can find the list of SOME of the
various things it does in the Acrobat SDK documentation concerning the various
flags that can be passed to the PDDocSave API call.
Leonard
From: Dennis Jenkins
<dennis.jenkins...@gmail.com<mailto:dennis.jenkins...@gmail.com>>
Date: Wednesday, May 7, 2014 at 1:59 AM
To:
"podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>"
<podofo-users@lists.sourceforge.net<mailto:podofo-users@lists.sourceforge.net>>
Subject: [Podofo-users] Splicing PDFs with AcroForms, NeedsAppearances,
mysterious file size shrinkage, Adobe Reader behavior
Hello all (but mostly directed to Leonard),
A few days ago I described [1] some odd behavior that I am having with Adobe
Reader consuming PDFs generated by my project. To avoid hijacking Christophe's
original thread, I am starting a new one.
At a high-level, my goal is to use PoDoFo to splice together pages from
various PDFs which are US tax forms, fill in the data, save the resulting PDF
and have the filled-in form fields "just work" in Adobe Reader (eg, be visible
and still editable) and have Adobe Reader NOT prompt the user to save the file
when the user attempts to exit. Secondly, I noticed that if I allow Adobe
Reader to save the PDF, it shrinks in half (sometimes). I want to know why, so
that I can optimize the size of my PDFs without needing Adobe Reader (my code
runs on Linux as part of a web service).
Leonard suggested that my PDF is malformed and that Adobe Reader is offering
to repair/save it in this case. After much experimentation and staring at
"podofobrowser" and "podofopdfinfo diffs" of the pre- and post- PDFs, I am not
100% convinced that this is the case.
In my code, I must set the "NeedsApperances" dictionary element of the
"/AcroForm" to "true", or my fields will not be visible in Adobe Reader. I
then need to populate the appearance stream, per section 12.7.3.3 of ISO
32000:2008 (herein referred to as "the spec"). When Adobe Reader saves my PDF,
this dictionary key disappears, and every field element gains a key called
"AP", with a child key of "N". This is discussed in 12.7.3.3 of the spec on
page #435, first complete paragraph.
If I omit adding the key for "NeedsApperances" to the AcroForm, Adobe Reader
will no longer offer to save my PDF, but my field values are no longer visible.
Therefore, I suspect that Adobe wants to save the PDF in order to
apply/generate the per-field appearance stream.
QUESTION 1: Is the above hypothesis valid?
I generate my PDFs by creating an empty PDF in memory, and "inserting" pages
from other PDFs. This results in a PDF with no "Fields" in the
"/AcroForm/Fields" array. Adobe Reader populates the "Fields" array when it
saves the PDF. However, the count of elements in the "Fields" array does not
match the actual count of fields. For example, Adobe Reader places 176
elements into this array, but when I enumerate all fields on all pages using
the PoDoFo API (with my patch to handle inherited fields), I count 212. I have
not completed an exhaustive comparison of the "Fields" arrays yet to determine
if the discrepancy is due to the inherited form fields (typically check boxes)
or not. I wrote a routine to populate the "Fields" array myself (with all 212
items), but Adobe Reader rebuilds it with on 176 items. If I do not set the
"NeedsApperances" flag, Adobe Reader never offers to save the PDF on exit, so
this array is not rebuilt in this case.
QUESTION 2: How does Adobe Reader determine which fields need to be in the
"/AcroForm/Fields" array?
Adobe Reader seems to not care that the "/AcroForm" is missing (its
presence or absence does not affect when Adobe Reader offers to save the form).
Yet section 12.7.2 of the spec states that the "/AcroForm" is required.
QUESTION 3: How do we reconcile section 12.7.2 with Adobe Reader's behavior?
Which is "correct" (or did I misunderstand the ISO)?
The content of the "Fields -> element -> AP -> N" key is an "/XObject".
The data stream created by Adobe Reader for it looks complicated.
QUESTION 4: Assuming the answer to Question #1 is "yes", Do you have any
suggestions on how I can compute the required XObject in code? I just want to
check a checkbox or place simple text into a text field.
When Adobe Reader does save the PDF, and depending on which source form(s)
are in it, the resulting PDF might shrink in size considerably. A cursory look
with podofobrowser shows that Adobe Reader has heavily modified "Pages ->
Kids[page] -> Contents[]". In my current testing PDF, the original has one
element in page #0 Contents, with a compressed length of 20443. Adobe Reader's
version has 8 array elements, each with approximately 2K of compressed XObject
data.
QUESTION 5: Why does Adobe Reader tinker with this part of a PDF when saving
it? Ok, that was rhetorical - I assume that it does so so the the file will be
smaller, and it also sets the "linearized" flag. The question should be
stated: What rules does Adobe Reader follow when deciding if/how to refactor
the actual page layout.
QUESTION 6: Why does refactoring the XObject components make the file so much
smaller (200K vs 450K for example).
In some cases, the file size savings are significant. If I knew what rules
Adobe Reader followed, I might attempt to write a routine to apply the same
changes using PoDoFo (and share it with the community).
Thank you for your time.
[1] http://sourceforge.net/p/podofo/mailman/message/32302847/
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users