Andriy created PDFBOX-1510:
------------------------------
Summary: PDF gets corrupted when trying to extract it from the
embedded files
Key: PDFBOX-1510
URL: https://issues.apache.org/jira/browse/PDFBOX-1510
Project: PDFBox
Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Andriy
Priority: Critical
When a PDF is attached to another PDF it gets corrupted when retrieved through
PDEmbeddedFile.getByteArray() method call. For some reason the returned array
has less data than the original file that has been attached to the PDF.
This affects some of the documents and not another. Below is the test code the
replicates the issue.
PDF that has an attachment that gets corrupted will be attached to the issue.
public class PDFEmbeddedFiles {
private PDFEmbeddedFiles() {
}
public static void main(String[] args) throws Exception {
if (args.length != 1) {
usage();
System.exit(1);
} else {
PDDocument document = null;
try {
File pdfFile = new File(args[0]);
/*
String filePath = pdfFile.getParent()
+
System.getProperty("file.separator");
*/
document = PDDocument.load(pdfFile);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch (InvalidPasswordException e) {
System.err.println("Error: The
document is encrypted.");
} catch
(org.apache.pdfbox.exceptions.CryptographyException e) {
e.printStackTrace();
}
}
PDDocumentNameDictionary namesDictionary =
document.getDocumentCatalog().getNames(); //new
PDDocumentNameDictionary(document.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode efTree =
namesDictionary.getEmbeddedFiles();
if (efTree != null) {
Map<String, Object> names =
efTree.getNames();
Iterator<String> namesKeys =
names.keySet().iterator();
while (namesKeys.hasNext()) {
String filename =
namesKeys.next();
PDComplexFileSpecification
fileSpec = (PDComplexFileSpecification) names
.get(filename);
PDEmbeddedFile embeddedFile =
fileSpec
.getEmbeddedFile();
String embeddedFilename =
filename;//filePath + filename;
File file = new
File(filename);//filePath + filename);
System.out.println("Writing " +
embeddedFilename);
FileOutputStream fos = new
FileOutputStream(file);
fos.write(embeddedFile.getByteArray());
fos.close();
}
}
} finally {
if (document != null) {
document.close();
}
}
}
}
/**
* This will print the usage for this program.
*/
private static void usage() {
System.err.println("Usage: java "
+ PDFEmbeddedFiles.class.getName() + "
<input-pdf>");
}
}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira