As the expression goes - “here there be dragons”.

You CANNOT reliably hash individual PDF objects with the expectation of 
comparison, because there is no canonicalization or serialization standard for 
PDF objects.  You can have objects that are 100% equivalent BUT are not 
equal/identical.  This is a blessing and a curse with respect to PDF and its’ 
one of the main reasons that “object-level encryption and signing” was removed 
from PDF 1.7 on the way to ISO 32000-1 - because no one (incl. Adobe) was able 
to implement it reliably.




On 5/19/15, 7:00 AM, "Jesse Long (JIRA)" <j...@apache.org> wrote:

>
>    [ 
> https://issues.apache.org/jira/browse/PDFBOX-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550230#comment-14550230
>  ] 
>
>Jesse Long commented on PDFBOX-2765:
>------------------------------------
>
>Thank you, your questions answered by need. I am doing this for existing PDFs. 
>Therefore, presumably, I wont need font subsetting. I only saw the subsetting 
>code in the save() method, and assumed I was missing it.
>
>John, its not only about compress, but also remove duplicates. PD API does not 
>have a visitor pattern, so I would need to know how to descend. Even COS 
>visitor does not descend for me/does not have a descending implementation. My 
>limited knowledge would be sure to introduce bugs.
>
>Also, how better to check for duplicates, including duplicate graphs of 
>related objects, than to coswrite the object and all dependencies and checksum 
>the output?
>
>Anyways, thanks for the input, you can close the issue and I evidently do not 
>need the change requested.
>
>> Add method to subset fonts for document pre-save
>> ------------------------------------------------
>>
>>                 Key: PDFBOX-2765
>>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2765
>>             Project: PDFBox
>>          Issue Type: New Feature
>>          Components: Writing
>>    Affects Versions: 2.0.0
>>            Reporter: Jesse Long
>>            Assignee: John Hewson
>>            Priority: Minor
>>
>> I have a custom COSWriter which compresses all streams and runs a SHA1 sum 
>> over each object, only writing one instance of each object with the same 
>> SHA1 sum.
>> This really helps compress PDFs.
>> I use this by calling MyCustomCOSWriter.write(PDDocument);
>> The trouble is that I have no way of calling the font subsetting that 
>> happens in PDDocument.save(). 
>> Could we have a method to perform that font subsetting manually?
>
>
>
>--
>This message was sent by Atlassian JIRA
>(v6.3.4#6332)
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
>For additional commands, e-mail: dev-h...@pdfbox.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to