[ https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192395#comment-17192395 ]
Tilman Hausherr commented on PDFBOX-4952: ----------------------------------------- Another small thing: "public boolean equals(Object o)" is removed from COSStream. However that one has been removed from the current 2.0 branch a few days ago, so why is this part of the diff? Re PDFBOX-45, I don't expect that this will interfere with the new logic (but you may have to adjust the diffs due to changing line numbers). See my last comment in that issue. > PDF compression - object stream creation > ---------------------------------------- > > Key: PDFBOX-4952 > URL: https://issues.apache.org/jira/browse/PDFBOX-4952 > Project: PDFBox > Issue Type: New Feature > Components: PDModel > Affects Versions: 2.0.21 > Reporter: Christian Appl > Priority: Major > Attachments: image-2020-09-07-09-47-30-172.png, > image-2020-09-07-10-05-15-631.png > > > I implemented a basic starting point to realize a PDF compression based on > PDFBox 2.0.22-SNAPSHOT > I want to use this ticket, to ask if you would be interested in such a > feature and whether you would be interested to merge it into PDFBox. > This is sort of a POC, only implementing some very basic functionality, that > surely must and could be extended further and it does only implement some > very basic and simplistic Unit Tests. > However it is able to reduce the size of resulting documents, and creates > objectstreams as defined in the PDF reference manual. > *What it currently does:* > It provides the bundling and compression of objects to objectstreams -and > further applies simple content compression to a small selection of contents-. > -To realize content compression, it provides a simple interface and abstract > class for "ContentCompressor"s which search a document for specific content, > that could be compressed and do compress that contents.- > -Currently two content compressors exist:- > -_ImageCompressor_- > -Searches for simple images, that could be compressed using DCT.- > -_UnencodedStreamCompressor_- > -Searches the document for yet unencoded streams and applies a Flate > compression where necessary.- > -Both compressors can be parameterized using a centralized > "CompressParameters" instance which is passed to a new "saveCompressed" > method of PDDocument.- > The compression is based on, modifies and is realized by a set of extensions > for the "COSWriter" class. Basically it organizes objects, that are passed to > the COSWriter in objectStreams -and applies content optimization where > necessary and possible-. > Currently this does support encryption, but does not support linearization of > the compressed documents. > *Caveat:* > If this feature is interesting to you, then I would not expect you to simply > merge this fork into 2.0.22. I am expecting that you would like to have some > details and concepts changed and am ready to implement changes that would be > required for this to work to your liking. > *POC:* > 4 resulting documents can be found in "target/test-output/compression" when > "COSDocumentCompressionTest" is run. > *The Pull request can be found on Github at:* > [https://github.com/apache/pdfbox/pull/86] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org