Andriy created PDFBOX-1510: ------------------------------ Summary: PDF gets corrupted when trying to extract it from the embedded files Key: PDFBOX-1510 URL: https://issues.apache.org/jira/browse/PDFBOX-1510 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Andriy Priority: Critical
When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another. Below is the test code the replicates the issue. PDF that has an attachment that gets corrupted will be attached to the issue. public class PDFEmbeddedFiles { private PDFEmbeddedFiles() { } public static void main(String[] args) throws Exception { if (args.length != 1) { usage(); System.exit(1); } else { PDDocument document = null; try { File pdfFile = new File(args[0]); /* String filePath = pdfFile.getParent() + System.getProperty("file.separator"); */ document = PDDocument.load(pdfFile); if (document.isEncrypted()) { try { document.decrypt(""); } catch (InvalidPasswordException e) { System.err.println("Error: The document is encrypted."); } catch (org.apache.pdfbox.exceptions.CryptographyException e) { e.printStackTrace(); } } PDDocumentNameDictionary namesDictionary = document.getDocumentCatalog().getNames(); //new PDDocumentNameDictionary(document.getDocumentCatalog()); PDEmbeddedFilesNameTreeNode efTree = namesDictionary.getEmbeddedFiles(); if (efTree != null) { Map<String, Object> names = efTree.getNames(); Iterator<String> namesKeys = names.keySet().iterator(); while (namesKeys.hasNext()) { String filename = namesKeys.next(); PDComplexFileSpecification fileSpec = (PDComplexFileSpecification) names .get(filename); PDEmbeddedFile embeddedFile = fileSpec .getEmbeddedFile(); String embeddedFilename = filename;//filePath + filename; File file = new File(filename);//filePath + filename); System.out.println("Writing " + embeddedFilename); FileOutputStream fos = new FileOutputStream(file); fos.write(embeddedFile.getByteArray()); fos.close(); } } } finally { if (document != null) { document.close(); } } } } /** * This will print the usage for this program. */ private static void usage() { System.err.println("Usage: java " + PDFEmbeddedFiles.class.getName() + " <input-pdf>"); } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira