[
https://issues.apache.org/jira/browse/PDFBOX-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258405#comment-17258405
]
Michael Klink commented on PDFBOX-4952:
---------------------------------------
{quote}the attached file can't be opened with Adobe Reader{quote}
Well, the cross references are invalid after all.
In [^102_Spot_to_CMYK_X1a_unc_BAD-3.0.0.pdf] the only cross references are:
{noformat}
xref
130 33
0000000015 00000 n
0000000568 00000 n
0000000608 00000 n
0000053742 00000 n
0000053790 00000 n
0000054133 00000 n
0000000153 00000 n
0000054194 00000 n
0000054229 00000 n
0001581021 00000 n
0001581587 00000 n
0001581610 00000 n
0001584243 00000 n
0001584299 00000 n
0001584369 00000 n
0001584433 00000 n
0001584497 00000 n
0001584561 00000 n
0001584631 00000 n
0001584680 00000 n
0001585206 00000 n
0001585585 00000 n
0001585730 00000 n
0001585885 00000 n
0001586030 00000 n
0001586181 00000 n
0001586332 00000 n
0001586483 00000 n
0001586601 00000 n
0001586717 00000 n
0001586999 00000 n
0001587277 00000 n
0001590728 00000 n
trailer
<<
/Size 163
/Root 130 0 R
/Info 136 0 R
/ID [<CBEF5B3CFE454F79B4B4952357FE6AF4> <60197C8791604CDCA7D5D9D7D2662195>]
>>
startxref
1592595
%%EOF
{noformat}
According to the PDF specification, though:
{panel:title=ISO 32000-1 section 7.5.4 Cross-Reference Table}
For a file that has never been incrementally updated, the cross-reference
section shall contain only one subsection, whose object numbering begins at 0.
{panel}
In case of your file the entries for objects 0..129 are missing.
> PDF compression - object stream creation
> ----------------------------------------
>
> Key: PDFBOX-4952
> URL: https://issues.apache.org/jira/browse/PDFBOX-4952
> Project: PDFBox
> Issue Type: New Feature
> Components: PDModel
> Affects Versions: 2.0.21
> Reporter: Christian Appl
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 3.0.0 PDFBox
>
> Attachments: 102_Spot_to_CMYK_X1a.pdf,
> 102_Spot_to_CMYK_X1a_unc_BAD-3.0.0.pdf,
> 102_Spot_to_CMYK_X1a_unc_GOOD-2.0.22.pdf, image-2020-09-07-09-47-30-172.png,
> image-2020-09-07-10-05-15-631.png
>
>
> I implemented a basic starting point to realize a PDF compression based on
> PDFBox 2.0.22-SNAPSHOT
> I want to use this ticket, to ask if you would be interested in such a
> feature and whether you would be interested to merge it into PDFBox.
> This is sort of a POC, only implementing some very basic functionality, that
> surely must and could be extended further and it does only implement some
> very basic and simplistic Unit Tests.
> However it is able to reduce the size of resulting documents, and creates
> objectstreams as defined in the PDF reference manual.
> *What it currently does:*
> It provides the bundling and compression of objects to objectstreams -and
> further applies simple content compression to a small selection of contents-.
> -To realize content compression, it provides a simple interface and abstract
> class for "ContentCompressor"s which search a document for specific content,
> that could be compressed and do compress that contents.-
> -Currently two content compressors exist:-
> -_ImageCompressor_-
> -Searches for simple images, that could be compressed using DCT.-
> -_UnencodedStreamCompressor_-
> -Searches the document for yet unencoded streams and applies a Flate
> compression where necessary.-
> -Both compressors can be parameterized using a centralized
> "CompressParameters" instance which is passed to a new "saveCompressed"
> method of PDDocument.-
> The compression is based on, modifies and is realized by a set of extensions
> for the "COSWriter" class. Basically it organizes objects, that are passed to
> the COSWriter in objectStreams -and applies content optimization where
> necessary and possible-.
> Currently this does support encryption, but does not support linearization of
> the compressed documents.
> *Caveat:*
> If this feature is interesting to you, then I would not expect you to simply
> merge this fork into 2.0.22. I am expecting that you would like to have some
> details and concepts changed and am ready to implement changes that would be
> required for this to work to your liking.
> *POC:*
> 4 resulting documents can be found in "target/test-output/compression" when
> "COSDocumentCompressionTest" is run.
> *The Pull request can be found on Github at:*
> [https://github.com/apache/pdfbox/pull/86]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]