Ondrej,

Ondřej Smetana wrote
> I'm using latest 5.5.3 iTextSharp for extracting xml data from Xfa-based
> form pdf's. It works pretty well but lately I've came accross some pdfs
> that throw following exception when I tried to load them with PdfReader:
> ...
> 
> I've uploaded the problematic pdf here
> https://www.dropbox.com/s/0cv7nwnv88157mm/example.pdf. Could some of you
> guys take a look at it and tell if it is a broken pdf or a possible bug in
> iText?

I could reproduce the issue using iText / Java.

I assume there is an error in the PDF.

The PDF is encrypted and contains three revisions. In the final revision a
signature is added. The AcroForm dictionary override in this revision looks
like this:

42 0 obj
<<
/DA (/Helv 0 Tf 0 g )
/DR <<
/Font <<
/Arial-BoldMT 16 0 R
/Arial-ItalicMT 17 0 R
/ArialMT 18 0 R
/CourierNewPS-BoldMT 19 0 R
/Helv 20 0 R
/MyriadPro-Regular 21 0 R
/Tahoma 22 0 R
/Tahoma-Bold 23 0 R
>>
>>
/Fields [ 122 0 R ]
/SigFlags 3
/XFA [ (xdp:xdp) 114 0 R (config) 2 0 R (template) 3 0 R (localeSet) 4 0 R
(datasets) 115 0 R (connectionSet) 6 0 R (xmpmeta) 7 0 R (xfdf) 8 0 R (form)
116 0 R (</xdp:xdp>) 117 0 R ]
>>
endobj

I.e. it contains unencrypted strings (value of DA and parts of the XFA
definition)
 
The specification, on the other hand, demands that

> Encryption applies to all strings and streams in the document's PDF file,
> with the following exceptions:
> * The values for the ID entry in the trailer
> * Any strings in an Encrypt dictionary
> * Any strings that are inside streams such as content streams and
> compressed object streams, which
> themselves are encrypted

Leaving the mentioned AcroForm strings unencrypted, therefore, seems wrong.

Thus, the signing application has to be repaired. (I think the bug in that
application is that it simply takes the older version of the AcroForm
dictionary, adds some entries for its signature, and outputs it as is
without considering the consequences of the fact that the former version was
contained in an encrypted object stream - and, therefore, its individual
string contents were exempt from individual encryption - but the override
was not in such a stream.)

(I merely say "I assume ..." and "... seems wrong" because I'm not really
deep into PDF encryption.)

Indeed iText stumbles at the first of those strings,
StandardDecryption.finish uses its member cipher without checking whether it
has been initialized, and it has not been initialized when it tries to
decrypt that unencrypted string.

But iText could easily be hardened against something like that, a simple
check in StandardDecryption.finish (either "cipher != null" or the already
existing flag "initiated") would do. I'm not sure, though, whether one
should instead

a) throw an exception more to the point,
b) accept the unencrypted strings but set the PdfReader to "repaired",
c) accept the unencrypted strings unconditionally.

Regards,   Michael



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/possible-bug-in-PdfReader-while-decrypting-Pdf-content-tp4660571p4660573.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to