[ 
https://issues.apache.org/jira/browse/PDFBOX-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364887#comment-17364887
 ] 

Michael Klink commented on PDFBOX-5216:
---------------------------------------

[~chae],
{quote}Could you please tell me the reason?{quote}
The code I posted on stack overflow only does one thing, it checks whether 
there are distinct objects with identical content in the PDF; if there are two 
such objects, it removes one of them and replaces all object references to the 
removed one by references to the remaining one. It does not check the role of 
the objects, though. In case of your example it e.g. does not recognize that it 
can drop one of two identical XObject resources if it replaces the associated 
names in the  related content streams.
{quote}You mentioned that the new version of PDFBox has not been tested yet, 
can it be used reliably in versions prior to PDFBox 3.0 pre-releases?{quote}
Not _the new version of PDFBox has not been tested_ but _my code has not been 
tested with newer PDFBox versions_. I know that there have been some changes in 
the {{equals}} checks... Checking the code with a number of real-live documents 
should suffice to determine whether it still can be used. Consider the _Words 
of warning_ at the end of the stack overflow answer, though!

> Is there a way to optimize by cleaning up duplicate objects?
> ------------------------------------------------------------
>
>                 Key: PDFBOX-5216
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5216
>             Project: PDFBox
>          Issue Type: Wish
>            Reporter: yoonho
>            Priority: Major
>         Attachments: samepage.png, 스크린샷 2021-06-15 오후 2.02.21.png
>
>
> Is there a way to clean up duplicate objects using PDFBox?
> [http://gofile.me/4hSqO/Cis33w0Sa] - Original
> [http://gofile.me/4hSqO/7XKmWqUBB]  - Clean version
> I applied the Adobe DC's Optimize option (relevant in the attached file). As 
> a result, a 48mb PDF file was reduced to 19mb. I think this is due to 
> cleaning up duplicate objects in the PDF.
> Am I right? I would like to implement this process with PDFBox. How should I 
> approach it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to