[ 
https://issues.apache.org/jira/browse/PDFBOX-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17207987#comment-17207987
 ] 

Michael Klink commented on PDFBOX-4970:
---------------------------------------

It would indeed be nice to have the PDF Debugger extended to provide 
information on changes between revisions and on unused material in the PDF. Or 
even better, to have these information available via members of the 
{{PDDocument}}.

Beware, though:
{quote}we would like to be able to detect that kind of prepared documents for 
the given attack. We were thinking to check if any duplicate object id is 
present in the document to be signed.
{quote}
Only because there is a duplicate object number in a revision, you cannot be 
sure yet that the document is _prepared for a shadow attack_, it merely has the 
_stink_ of such an attack.

I played around a bit myself, and my sniffer routine found a number of false 
positives, in particular:
 * Some PDF processors write data to the PDF output stream early to save 
memory. If the objects in question are changed again later in the same run, a 
second, updated copy of the object is simply appended to the stream to be later 
referenced from the cross references instead of the earlier version. As here 
there are two objects with similar but not identical contents in the same 
revision, one could falsely assume an attack preparation.
 * If the PDF in question contains an embedded PDF attachment, there quite 
likely are numerous object numbers used both in the embedded and in the 
embedding PDF. Embedding PDF attachments like that _can_ be a preparation for a 
shadow attack but usually isn't one.

Similarly, sniffing for the other attack types also only finds _stinks_ of such 
_preparations_ but not 100% sure indications. E.g. non-matching form field 
values and display values also occur for other, dumb reasons, unrelated to 
attacks.

Thus, you most likely won't _be able to detect that kind of prepared documents 
for the given attack_, merely a _stink_ thereof.

Nonetheless, also detection of a mere stink can be interesting as an attacker 
can probably exploit such accidental existing structures like an attack 
preparation. The result might be subtle changes, e.g. a switch to a previous 
revision of some paragraph in a contract which for good reasons has not been 
signed in that original form.

> Possibility to detect duplicate ids in a revision
> -------------------------------------------------
>
>                 Key: PDFBOX-4970
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4970
>             Project: PDFBox
>          Issue Type: Improvement
>            Reporter: Pierrick Vandenbroucke
>            Priority: Major
>
> We are trying to detect files which contain several objects with the same 
> identifier within a revision or in a given PDF. Currently, that seems not 
> possible. We are facing to this 
> [map|https://github.com/apache/pdfbox/blob/2.0.21/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDocument.java#L56]
>  which only allows one instance by object id. The map usage brings 
> limitations (eg : rendering,...).
> Is that possible to detect such files ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to