[
https://issues.apache.org/jira/browse/PDFBOX-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153193#comment-15153193
]
Tilman Hausherr edited comment on PDFBOX-3238 at 2/18/16 10:27 PM:
-------------------------------------------------------------------
Ouch. And it doesn't work with importPage either :-( Your analysis is correct.
Inherited resources are ignored :-(
A workaround for addPages would be something like this:
{code}
PDResources res = page.getResources();
destination.addPage(page);
destination.getPage(0).setResources(res);
{code}
was (Author: tilman):
Ouch. And it doesn't work with importPage either :-(
> Page resources are not inherited from an ancestor node in the page tree
> -----------------------------------------------------------------------
>
> Key: PDFBOX-3238
> URL: https://issues.apache.org/jira/browse/PDFBOX-3238
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 1.8.11, 2.0.0
> Environment: Found on Windows 7 x64, JRE from 1.5 to 8
> Reporter: Evgeny Chesnokov
> Attachments: Welding Fixture Model.dwg.pdf
>
>
> Attached is a sample file with a single image on the 1st page in it. When I
> append the 1st page of a loaded document to a new document, the new document
> does not have an image in it (displayed as a blank page; Acrobat Reader says
> the file is broken).
> Steps to reproduce:
> 1. load an attached PDF file using PdfBox (checked versions 1.8.11 and
> 2.0.0-RC2, tried both {{#load()}} and {{#loadNonSeq()}})
> 2. create a new document
> 3. add a page from a loaded document to a new document
> 4. save a document to a new file.
> Expected: a new PDF file gets created, when opened, it contains an image on
> the 1st page.
> Actual behaviour: a new PDF file gets created, when opened, the 1st page is
> empty and Acrobat Reader reports an error ("An error exists on this page.
> Acrobat may not display the page correctly.").
> Code to reproduce the issue for version 1.8.11:
> {code}
> PDDocument source = PDDocument.load(new File("Welding Fixture
> Model.dwg.pdf"));
> PDPage page = (PDPage)
> source.getDocumentCatalog().getAllPages().get(0);
>
> PDDocument destination = new PDDocument();
> destination.addPage(page);
> destination.save("Welding Fixture Model.dwg.page0.pdf");
> destination.close();
> {code}
> ==========
> Research summary: I've decoded the attached PDF using {{qpdf}} utility and
> investigated its structure. Basically, there's no {{/Resources}} declaration
> in a {{/Page}} object, so it should get inherited from a {{/Pages}} object.
> Instead it is replaced with an empty resources object, so when saved, it does
> not have an image in it.
> Research details:
> Below are pieces of a decoded structure of the attached PDF.
> *Pages list declaration:*
> {noformat}
> 3 0 obj
> <<
> /Count 1
> /Kids [
> 4 0 R
> ]
> /Resources 5 0 R
> /Type /Pages
> >>
> endobj
> {noformat}
> Explanation:
> - {{/Type /Pages}} says this object is a list of pages;
> - {{/Kids}} is an array of references to the individual page objects. In
> this case, object #4 is the only page in a document;
> - {{/Resources 5 0 R}} stores a reference to a single resource that is used
> by the {{/Pages}} object. This is object #5, an image.
> *1st page declaration:*
> {noformat}
> 4 0 obj
> <<
> /Contents 6 0 R
> /MediaBox [
> 0
> 0
> 1984
> 2551
> ]
> /Parent 3 0 R
> /Type /Page
> >>
> endobj
> {noformat}
> Explanation:
> - {{/Type /Page}} says it's a page (duh);
> - {{/Contents 6 0 R}} references an object #6 that is used to render the
> content of the page (I won't provide it but it uses the image object #5
> mentioned above);
> - {{/Parent 3 0 R}} is a reference to a {{/Pages}} object described above.
> An important thing here is that this object does not have a {{/Resources}}
> section of its own. In this case, PDF spec says:
> bq. (Required; inheritable) A dictionary containing any resources required by
> the page (see 7.8.3, "Resource Dictionaries"). If the page requires no
> resources, the value of this entry shall be an empty dictionary. *Omitting
> the entry entirely indicates that the resources shall be inherited from an
> ancestor node in the page tree*.
> This last sentence means that Page 1 has the same list of resources as its
> parent /Pages object, and this is where PdfBox misbehaves. When exporting a
> page with no {{/Resources}} tag, it uses an **EMPTY** list of resources
> instead of an inherited one.
> To verify this, I've added {{/Resources 5 0 R}} line to the sample PDF 1st
> page declaration:
> {noformat}
> 4 0 obj
> <<
> /Contents 6 0 R
> /MediaBox [
> 0
> 0
> 1984
> 2551
> ]
> /Parent 3 0 R
> /Resources 5 0 R
> /Type /Page
> >>
> endobj
> {noformat}
> After I did this, PdfBox successfully extracted the 1st page of this document
> and it correctly displayed an image.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]