[ 
https://issues.apache.org/jira/browse/PDFBOX-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13672943#comment-13672943
 ] 

Michael Kuß edited comment on PDFBOX-1618 at 6/3/13 9:21 AM:
-------------------------------------------------------------

I analysed a little bit more on this. There some differences between splitting 
documents with pdfbox and Adobe.
- one is the missing compression, that is lost during importPage in the 
PDDocument.
- another thing is, that links (annots) on a splitted page to another page (not 
in the part) will result in a completetly copied dependency tree. So the linked 
page will also be included. Adobe will leave the references as is with the 
linked pages missing. 
  I don't think this is actually a bug, but it would be nice to have a similar 
functionality as in Adobe. Also IMHO the complete page referenced in a internal 
pdf hyperlink should not be repeated.

e.g. if I take "...-3.pdf" from my attached zip you can see in that the Annots 
do have the complete page included. I will attach a screenshot from the 
internal structure seen from adobe. (seen as -Teil4.pdf)
                
      was (Author: michael.kuss):
    I analysed a little bit more on this. There some differences between 
splitting documents with pdfbox and Adobe.
- one is the missing compression, that is lost during importPage in the 
PDDocument.
- another thing is, that links (annots) on a splitted page to another page (not 
in the part) will result in a completetly copied dependency tree. So the linked 
page will also be included. Adobe will leave the references as is with the 
linked pages missing. 
  I don't think this is actually a bug, but it would be nice to have a similar 
functionality as in Adobe. Also IMHO the complete page referenced in a internal 
pdf hyperlink should not be repeated.

e.g. if I take "...Teil3.pdf" from my attached zip you can see in that the 
Annots do have the complete page included. I will attach a screenshot from the 
internal structure seen from adobe.
                  
> Split PDF file to single page files, some files are inflated in size
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-1618
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1618
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.1
>         Environment: Windows 7, JVM 1.6.0_29
>            Reporter: Tom Taylor
>         Attachments: 112080-TECHNICAL MANUAL FOR GENERATOR NIR 7194 A-10LW OF 
> 4038 KVA.pdf, Test_PDFs.zip
>
>
> A PDF file is split into single pages for inclusion within another document 
> (pdfbox.utils.Splitter within our code but same phenomenon observed when 
> splitting using command line PDFSplit tool). Som of the pages are almost as 
> large as the original file which causes performance problems for our 
> customers.
> Again, I have a sample pdf to attach.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to