[
https://issues.apache.org/jira/browse/PDFBOX-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270200#comment-14270200
]
John Hewson commented on PDFBOX-2592:
-------------------------------------
Yes, exactly. However, if a user is retaining objects beyond when they are
needed then _of course_ they use memory. The solution is not to add some
workaround whereby we clobber all COS objects when close() is called, but to
simply direct the user to add one line of code to their source: document =
null. The responsibility for owning objects in Java is very clear, and we're
violating that rule to facilitate users who are doing the wrong thing. We're
not helping anyone by doing this, we're just adding to the mess. On the plus
side, there are surely very few users who are doing such a thing nowadays, and
we have a one line fix for those who do.
Having said that, the current situation seems acceptable, at first glance.
However, it's actually causes corruption and silent failures:
{code}
// example 1
COSDictionary page1 = doc1.getPage(0);
doc1.close();
PDDocument doc2 = new PDDocument();
doc2.addPage(page1);
doc2.save("out.pdf"); // fails silently!!! the generated pdf is blank
{code}
{code}
// example 2
PDPage page1 = doc1.getPage(0);
PDFormXObject form = new PDFormXObject(page1.getStream());
form.setBBox(page1.getCropBox());
form.setResources(page1.getResources());
doc1.close(); // we close doc1 once we have finished using it
PDDocument doc2 = new PDDocument();
PDPage page2 = new PDPage();
doc2.addPage(page2);
PDPageContentStream contents = new PDPageContentStream(doc2, page2);
contents.drawForm(form);
contents.close();
doc2.save("out.pdf"); // fails with NullPointerException in COSStream
doc2.close();
{code}
{code}
// example 3
PDFont font = doc1.getPage(0).getResources().getFont(COSName.getPDFName("F1"));
doc1.close();
PDDocument doc2 = new PDDocument();
PDPage page2 = new PDPage();
page2.setResources(new PDResources());
page2.getResources().put(COSName.getPDFName("Foo"), font);
doc2.addPage(page2);
doc2.addPage(page1);
doc2.save("out.pdf"); // fails silently!!! the generated pdf contains an empty
dict for Foo
{code}
But it's not consistent, because this works:
{code}
PDRectangle rect = doc1.getPage(0).getMediaBox();
doc1.close();
PDDocument doc2 = new PDDocument();
PDPage page2 = new PDPage();
doc2.addPage(page2);
page2.setCropBox(rect);
{code}
This shows how harmful the "workaround" in COSDocument close() actually is,
we're getting silent failures and NPEs for behaviour which should be valid, or
in the case of example 2 should throw a meaningful exception (that's what I
want COSStream#isClosed() for).
As it stands this is broken and needs fixing.
> Allow sharing of COS objects between different documents
> --------------------------------------------------------
>
> Key: PDFBOX-2592
> URL: https://issues.apache.org/jira/browse/PDFBOX-2592
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 2.0.0
> Reporter: John Hewson
> Assignee: John Hewson
>
> A number of users on the mailing list have asked about how to import pages
> from other PDFs as forms, our current solution is LayerUtility, which is
> depends on PDFCloneUtility. Both these classes are surprisingly complex for
> what should be a simple task.
> The two main tasks which these classes perform is copying the page's
> COSStream and cloning every relevant COS object. However, there seems to be
> no real need to do any of this copying and cloning - there's nothing about
> any of the COS objects which is specific to a given document. While a
> COSStream can share the same backing file as the COSDocument, this isn't a
> problem for COSWriter, even then we need only make sure that an exception is
> thrown if a COSStream is used after its parent COSDocument is closed.
> Note that there *is* one artificial dependency between COSDictionary and
> COSArrays and their parent COSDocument, that is that calling close() on the
> COSDocument clears the contents of all child COSDictionary and COSArrays.
> However, there's no need for this, it seems to have come about due to some
> long past confusion regarding how garbage collection works in Java - we all
> know that it's not necessary to set objects to null or clear lists when we
> are done with them.
> I propose that we get rid of the unnecessary object and list clearing in
> COSDocument#close() and add some checks to COSStream to throw user-friendly
> exceptions when reading from a closed backing stream. This will allow us to
> directly share COS objects between different COSDocuments, allowing simple "x
> = y" copying and making LayerUtility and PDFCloneUtility unnecessary. Instead
> of:
> {code}
> COSStream pageStream = (COSStream)page.getStream().getCOSObject();
> PDStream newStream = new PDStream(targetDoc,
> pageStream.getUnfilteredStream(), false);
> PDFormXObject form = new PDFormXObject(newStream);
> PDResources pageRes = page.getResources();
> PDResources formRes = new PDResources();
> PDFCloneUtility cloner = new PDFCloneUtility(document);
> cloner.cloneMerge(pageRes, formRes);
> form.setResources(formRes);
> {code}
> We could have:
> {code}
> PDFormXObject form = new PDFormXObject(page.getStream());
> form.setResources(page.getResources());
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)