Good morning Bruno,
Unfortunately, the actual file I am working on is still under embargo and
besides, it is quite large at 16MB so I have been testing on a smaller test
file I made up.
Attached the following:
a.non-linearized.pdf pdf document created from word doc
using Acrobat 9, saved as non-linearized
a.linearized.pdf same source word
document as above, pdf created using Acrobat 9 and saved as linearized file
a.linearized-2-non-linearized.pdf exact copy of
a.linearized.pdf, opened in Acrobat 9, small change made then Saved (note.
NOT Saved As)
In the linearized file you will see 2 xref tables: one @ line 12, one @ line
156. Neither reference the <<Linearization>> dictionary.
PDF Specification 1.7 on page 679:
F.3.3 Linearization Parameter Dictionary (Part 2)
...
There shall be no references to this dictionary anywhere in the document
...
Writing iText code to confirm if a pdf document is linearized seems, at
first sight, to be simple - a simple search for the term <</linearized
should be all that is need?
Unfortunately, it is not that simple!
When a linearized pdf document is Saved (NOT Saved As) the document is
incrementally updated. In this case Linearization is no longer valid.
Actually, it is but becomes very complicated and in such a case
Acrobat/Reader show the file to be non-linearized (specifically: Fast Web
View = NO). This can be seen by viewing document properties of the attached
file: a.linearized-2-non-linearized.pdf. However, the <<Linearization>>
dictionary remains in the pdf document. The only way to be sure - and I
believe that this may not be absolutely guaranteed - is to check that the
file length stated in the <<Linearization>> dictionary matches the actual
file length.
Please see:
PDF Specification 1.7 G.7 Accessing an Updated File (page 697)
Adobe discussion forum post: http://forums.adobe.com/message/3934054#3934054
- in second to last post Leonard confirms linearized files cannot have
incremental updates but the <<Linearization>> dictionary remains.
I have coded a function which confirms Linearization based on the above
criteria - that is: the presence of Linearized and equal file lengths -
isLinearized.txt attached for your reference.
However, it would be more professional and robust to be able to treat this
dictionary as an object (direct or indirect).
Hope the above stimulates some thoughts and/or ideas? If you need anything
else, please let me know.
Kind regards
William
William Bell
T: +44 7795 463646
E: [email protected]
From: 1T3XT BVBA [mailto:[email protected]]
Sent: 27 September 2011 19:27
To: Post all your questions about iText here
Subject: Re: [iText-questions] Retrieving all Indirect Object
On 27/09/2011 13:11, William Bell wrote:
The object I am interesting in is the <<Linearization>> dictionary. This is
added to the file when the file is Saved As linearized but is not referenced
anywhere in the file. Hence, my trying the brute force method!
See below for extract from PDF file, code and its output.
The code returns object 24 (<<Linearization>>) as null.
My question is simple - is there any way I can retrieve the
<<Linearization>> dictionary as an object (direct or indirect)? Or, do I
have to treat the file as a stream and search for the term
'<<Linearization>>' and deal with its values manually? [Note: no problem to
do this but it would be nicer to deal with this dictionary as an object if
possible]
Do you have a PDF I can take a look at.
Normally, each object can be found somewhere in the XRef table.
A linearized PDF has more than one XRef table.
I'd like to see in which XRef table that object can be found.
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php