[
https://issues.apache.org/jira/browse/PDFBOX-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253877#comment-15253877
]
Alexander Kriegisch edited comment on PDFBOX-3323 at 4/22/16 10:08 PM:
-----------------------------------------------------------------------
Okay, the solution is more complex than I thought because before the merge I do
not have a PDDocument and need to create a COSStream for the XMP meta data.
Furthermore, it is non-trivial to set the creator property for XMP. I had to
look into the XMPBox source code in order to find out how to do that. Maybe you
want to publish this as an example if you find it useful and comprehensive. I
think it is important to return something to the community, especially because
Tilman supported me so well.
{code}
package de.scrum_master.pdf_tools;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.schema.XMPBasicSchema;
import org.apache.xmpbox.type.AgentNameType;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.xml.transform.TransformerException;
import java.io.*;
import java.util.Calendar;
import java.util.List;
public class PDFMerger {
private final static Logger logger = LoggerFactory.getLogger(PDFMerger.class);
/**
* Modified {@link ByteArrayOutputStream} whose {@link
StateExposingByteArrayOutputStream#toByteArray()}
* method directly returns its internal byte buffer in order to avoid
in-memory copies during PDF merge.
* <p></p>
* Please use carefully!
*/
private static class StateExposingByteArrayOutputStream extends
ByteArrayOutputStream {
@Override
public synchronized byte[] toByteArray() {
return buf;
}
}
/**
* Creates a compound PDF document from a list of input documents
* <p></p>
* The merged document is PDF/A-1b compliant, provided the source documents
are as well.
* It contains document properties title, creator and subject, currently
hard-coded.
*
* @param sources list of source PDF document streams
* @return compound PDF document as a readable stream
* @throws if anything goes wrong during PDF merge
*/
public InputStream merge(final List<InputStream> sources) throws IOException {
String title = "My title";
String creator = "Alexander Kriegisch";
String subject = "Subject with umlauts ÄÖÜ";
try (
ByteArrayOutputStream mergedPDFOutputStream = new
StateExposingByteArrayOutputStream();
COSStream cosStream = new COSStream()
) {
PDFMergerUtility pdfMerger = createPDFMergerUtility(sources,
mergedPDFOutputStream);
// PDF and XMP properties must be identical, otherwise document is not
PDF/A compliant
PDDocumentInformation pdfDocumentInfo = createPDFDocumentInfo(title,
creator, subject);
PDMetadata xmpMetadata = createXMPMetadata(cosStream, title, creator,
subject);
pdfMerger.setDestinationDocumentInformation(pdfDocumentInfo);
pdfMerger.setDestinationMetadata(xmpMetadata);
logger.trace("Merging {} source documents into one PDF", sources.size());
pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
logger.trace("PDF merge successful, size = {} bytes",
mergedPDFOutputStream.size());
return new ByteArrayInputStream(mergedPDFOutputStream.toByteArray(), 0,
mergedPDFOutputStream.size());
} catch (BadFieldValueException | TransformerException e) {
throw new IOException("PDF merge problem", e);
} finally {
for (InputStream source : sources) {
try {
source.close();
} catch (IOException e) {}
}
}
}
private PDFMergerUtility createPDFMergerUtility(
List<InputStream> sources,
ByteArrayOutputStream mergedPDFOutputStream
) {
logger.trace("Initialising PDF merge utility");
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.addSources(sources);
pdfMerger.setDestinationStream(mergedPDFOutputStream);
return pdfMerger;
}
private PDDocumentInformation createPDFDocumentInfo(
String title, String creator, String subject
) {
logger.trace("Setting document info (title, author, subject) for merged
PDF");
PDDocumentInformation documentInformation = new PDDocumentInformation();
documentInformation.setTitle(title);
documentInformation.setCreator(creator);
documentInformation.setSubject(subject);
return documentInformation;
}
private PDMetadata createXMPMetadata(
COSStream cosStream,
String title, String creator, String subject
)
throws BadFieldValueException, TransformerException, IOException
{
logger.trace("Setting XMP metadata (title, author, subject) for merged
PDF");
XMPMetadata xmpMetadata = XMPMetadata.createXMPMetadata();
// PDF/A-1b properties
PDFAIdentificationSchema pdfaSchema =
xmpMetadata.createAndAddPFAIdentificationSchema();
pdfaSchema.setPart(1);
pdfaSchema.setConformance("B");
// Dublin Core properties
DublinCoreSchema dublinCoreSchema =
xmpMetadata.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle(title);
dublinCoreSchema.addCreator(creator);
dublinCoreSchema.setDescription(subject);
// XMP Basic properties
XMPBasicSchema basicSchema = xmpMetadata.createAndAddXMPBasicSchema();
Calendar creationDate = Calendar.getInstance();
basicSchema.setCreateDate(creationDate);
basicSchema.setModifyDate(creationDate);
basicSchema.setMetadataDate(creationDate);
basicSchema.setCreatorToolProperty(
(AgentNameType) xmpMetadata
.getTypeMapping()
.instanciateSimpleField(basicSchema.getClass(), null,
basicSchema.getPrefix(), basicSchema.CREATORTOOL, creator)
);
// Create and return XMP data structure in XML format
try (
ByteArrayOutputStream xmpOutputStream = new
StateExposingByteArrayOutputStream();
OutputStream cosXMPStream = cosStream.createOutputStream()
) {
new XmpSerializer().serialize(xmpMetadata, xmpOutputStream, true);
cosXMPStream.write(xmpOutputStream.toByteArray());
return new PDMetadata(cosStream);
}
}
}
{code}
*Edit:* BTW, [~tilman], if you only ask me, you can put the functionality into
release 2.0.1 because it works for me, even though it is a bit hard to
implement it in a PDF/A compliant way. It would be nice if some time in the
future I could just set PDF properties and say "please save as PDF/A-1b and the
corresponding XMP would automatically be added.
was (Author: kriegaex):
Okay, the solution is more complex than I thought because before the merge I do
not have a PDDocument and need to create a COSStream for the XMP meta data.
Furthermore, it is non-trivial to set the creator property for XMP. I had to
look into the XMPBox source code in order to find out how to do that. Maybe you
want to publish this as an example if you find it useful and comprehensive. I
think it is important to return something to the community, especially because
Tilman supported me so well.
{code}
package de.scrum_master.pdf_tools;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.io.MemoryUsageSetting;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.PDFAIdentificationSchema;
import org.apache.xmpbox.schema.XMPBasicSchema;
import org.apache.xmpbox.type.AgentNameType;
import org.apache.xmpbox.type.BadFieldValueException;
import org.apache.xmpbox.xml.XmpSerializer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import javax.xml.transform.TransformerException;
import java.io.*;
import java.util.Calendar;
import java.util.List;
public class PDFMerger {
private final static Logger logger = LoggerFactory.getLogger(PDFMerger.class);
/**
* Modified {@link ByteArrayOutputStream} whose {@link
StateExposingByteArrayOutputStream#toByteArray()}
* method directly returns its internal byte buffer in order to avoid
im-memory copies during PDF merge.
* <p></p>
* Please use carefully!
*/
private static class StateExposingByteArrayOutputStream extends
ByteArrayOutputStream {
@Override
public synchronized byte[] toByteArray() {
return buf;
}
}
/**
* Creates a compound PDF document from a list of input documents
* <p></p>
* The merged document is PDF/A-1b compliant, provided the source documents
are as well.
* It contains document properties title, creator and subject, currently
hard-coded.
*
* @param sources list of source PDF document streams
* @return compound PDF document as a readable stream
* @throws if anything goes wrong during PDF merge
*/
public InputStream merge(final List<InputStream> sources) throws IOException {
String title = "My title";
String creator = "Alexander Kriegisch";
String subject = "Subject with umlauts ÄÖÜ";
try (
ByteArrayOutputStream mergedPDFOutputStream = new
StateExposingByteArrayOutputStream();
COSStream cosStream = new COSStream()
) {
PDFMergerUtility pdfMerger = createPDFMergerUtility(sources,
mergedPDFOutputStream);
// PDF and XMP properties must be identical, otherwise document is not
PDF/A compliant
PDDocumentInformation pdfDocumentInfo = createPDFDocumentInfo(title,
creator, subject);
PDMetadata xmpMetadata = createXMPMetadata(cosStream, title, creator,
subject);
pdfMerger.setDestinationDocumentInformation(pdfDocumentInfo);
pdfMerger.setDestinationMetadata(xmpMetadata);
logger.trace("Merging {} source documents into one PDF", sources.size());
pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
logger.trace("PDF merge successful, size = {} bytes",
mergedPDFOutputStream.size());
return new ByteArrayInputStream(mergedPDFOutputStream.toByteArray(), 0,
mergedPDFOutputStream.size());
} catch (BadFieldValueException | TransformerException e) {
throw new IOException("PDF merge problem", e);
} finally {
for (InputStream source : sources) {
try {
source.close();
} catch (IOException e) {}
}
}
}
private PDFMergerUtility createPDFMergerUtility(
List<InputStream> sources,
ByteArrayOutputStream mergedPDFOutputStream
) {
logger.trace("Initialising PDF merge utility");
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.addSources(sources);
pdfMerger.setDestinationStream(mergedPDFOutputStream);
return pdfMerger;
}
private PDDocumentInformation createPDFDocumentInfo(
String title, String creator, String subject
) {
logger.trace("Setting document info (title, author, subject) for merged
PDF");
PDDocumentInformation documentInformation = new PDDocumentInformation();
documentInformation.setTitle(title);
documentInformation.setCreator(creator);
documentInformation.setSubject(subject);
return documentInformation;
}
private PDMetadata createXMPMetadata(
COSStream cosStream,
String title, String creator, String subject
)
throws BadFieldValueException, TransformerException, IOException
{
logger.trace("Setting XMP metadata (title, author, subject) for merged
PDF");
XMPMetadata xmpMetadata = XMPMetadata.createXMPMetadata();
// PDF/A-1b properties
PDFAIdentificationSchema pdfaSchema =
xmpMetadata.createAndAddPFAIdentificationSchema();
pdfaSchema.setPart(1);
pdfaSchema.setConformance("B");
// Dublin Core properties
DublinCoreSchema dublinCoreSchema =
xmpMetadata.createAndAddDublinCoreSchema();
dublinCoreSchema.setTitle(title);
dublinCoreSchema.addCreator(creator);
dublinCoreSchema.setDescription(subject);
// XMP Basic properties
XMPBasicSchema basicSchema = xmpMetadata.createAndAddXMPBasicSchema();
Calendar creationDate = Calendar.getInstance();
basicSchema.setCreateDate(creationDate);
basicSchema.setModifyDate(creationDate);
basicSchema.setMetadataDate(creationDate);
basicSchema.setCreatorToolProperty(
(AgentNameType) xmpMetadata
.getTypeMapping()
.instanciateSimpleField(basicSchema.getClass(), null,
basicSchema.getPrefix(), basicSchema.CREATORTOOL, creator)
);
// Create and return XMP data structure in XML format
try (
ByteArrayOutputStream xmpOutputStream = new
StateExposingByteArrayOutputStream();
OutputStream cosXMPStream = cosStream.createOutputStream()
) {
new XmpSerializer().serialize(xmpMetadata, xmpOutputStream, true);
cosXMPStream.write(xmpOutputStream.toByteArray());
return new PDMetadata(cosStream);
}
}
}
{code}
*Edit:* BTW, [~tilman], if you only ask me, you can put the functionality into
release 2.0.1 because it works for me, even though it is a bit hard to
implement it in a PDF/A compliant way. It would be nice if some time in the
future I could just set PDF properties and say "please save as PDF/A-1b and the
corresponding XMP would automatically be added.
> Cannot set destination meta data in PDFMergerUtility
> ----------------------------------------------------
>
> Key: PDFBOX-3323
> URL: https://issues.apache.org/jira/browse/PDFBOX-3323
> Project: PDFBox
> Issue Type: Improvement
> Affects Versions: 1.8.9, 2.0.0
> Reporter: Alexander Kriegisch
> Assignee: Tilman Hausherr
> Labels: merge, metadata
> Fix For: 2.0.1, 2.1.0
>
>
> When merging multiple PDFs into one compound document via
> {{PDFMergerUtility}}, meta data like title, author, subject cannot be set but
> seem to be taken from one of the input documents. This is usually not the
> desired behaviour because as a user I have no direct influence on the meta
> data. As a user I would like to explicitly set or at least overwrite certain
> meta data for the destination document. Currently I can only set the
> destination stream or file name, but not the meta data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]