[
https://issues.apache.org/jira/browse/PDFBOX-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672943#comment-13672943
]
Michael Kuß edited comment on PDFBOX-1618 at 6/3/13 9:21 AM:
-------------------------------------------------------------
I analysed a little bit more on this. There some differences between splitting
documents with pdfbox and Adobe.
- one is the missing compression, that is lost during importPage in the
PDDocument.
- another thing is, that links (annots) on a splitted page to another page (not
in the part) will result in a completetly copied dependency tree. So the linked
page will also be included. Adobe will leave the references as is with the
linked pages missing.
I don't think this is actually a bug, but it would be nice to have a similar
functionality as in Adobe. Also IMHO the complete page referenced in a internal
pdf hyperlink should not be repeated.
e.g. if I take "...-3.pdf" from my attached zip you can see in that the Annots
do have the complete page included. I will attach a screenshot from the
internal structure seen from adobe. (seen as -Teil4.pdf)
was (Author: michael.kuss):
I analysed a little bit more on this. There some differences between
splitting documents with pdfbox and Adobe.
- one is the missing compression, that is lost during importPage in the
PDDocument.
- another thing is, that links (annots) on a splitted page to another page (not
in the part) will result in a completetly copied dependency tree. So the linked
page will also be included. Adobe will leave the references as is with the
linked pages missing.
I don't think this is actually a bug, but it would be nice to have a similar
functionality as in Adobe. Also IMHO the complete page referenced in a internal
pdf hyperlink should not be repeated.
e.g. if I take "...Teil3.pdf" from my attached zip you can see in that the
Annots do have the complete page included. I will attach a screenshot from the
internal structure seen from adobe.
> Split PDF file to single page files, some files are inflated in size
> --------------------------------------------------------------------
>
> Key: PDFBOX-1618
> URL: https://issues.apache.org/jira/browse/PDFBOX-1618
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.1
> Environment: Windows 7, JVM 1.6.0_29
> Reporter: Tom Taylor
> Attachments: 112080-TECHNICAL MANUAL FOR GENERATOR NIR 7194 A-10LW OF
> 4038 KVA.pdf, Test_PDFs.zip
>
>
> A PDF file is split into single pages for inclusion within another document
> (pdfbox.utils.Splitter within our code but same phenomenon observed when
> splitting using command line PDFSplit tool). Som of the pages are almost as
> large as the original file which causes performance problems for our
> customers.
> Again, I have a sample pdf to attach.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira