Andriy created PDFBOX-1510:
------------------------------

             Summary: PDF gets corrupted when trying to extract it from the 
embedded files
                 Key: PDFBOX-1510
                 URL: https://issues.apache.org/jira/browse/PDFBOX-1510
             Project: PDFBox
          Issue Type: Bug
    Affects Versions: 1.7.1
            Reporter: Andriy
            Priority: Critical


When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another. Below is the test code the 
replicates the issue.

PDF that has an attachment that gets corrupted will be attached to the issue.




public class PDFEmbeddedFiles {

        private PDFEmbeddedFiles() {
        }

        public static void main(String[] args) throws Exception {

                if (args.length != 1) {
                        usage();
                        System.exit(1);
                } else {

                        PDDocument document = null;

                        try {
                                File pdfFile = new File(args[0]);
                                /*
                                String filePath = pdfFile.getParent()
                                                + 
System.getProperty("file.separator");
                                */
                                document = PDDocument.load(pdfFile);
                                if (document.isEncrypted()) {
                                        try {
                                                document.decrypt("");
                                        } catch (InvalidPasswordException e) {
                                                System.err.println("Error: The 
document is encrypted.");
                                        } catch 
(org.apache.pdfbox.exceptions.CryptographyException e) {
                                                e.printStackTrace();
                                        }
                                }
                                
                                PDDocumentNameDictionary namesDictionary = 
document.getDocumentCatalog().getNames(); //new 
PDDocumentNameDictionary(document.getDocumentCatalog());
                                PDEmbeddedFilesNameTreeNode efTree = 
namesDictionary.getEmbeddedFiles();
                                if (efTree != null) {
                                        Map<String, Object> names = 
efTree.getNames();
                                        Iterator<String> namesKeys = 
names.keySet().iterator();
                                        while (namesKeys.hasNext()) {
                                                String filename = 
namesKeys.next();
                                                PDComplexFileSpecification 
fileSpec = (PDComplexFileSpecification) names
                                                                .get(filename);
                                                PDEmbeddedFile embeddedFile = 
fileSpec
                                                                
.getEmbeddedFile();
                                                String embeddedFilename = 
filename;//filePath + filename;
                                                File file = new 
File(filename);//filePath + filename);
                                                System.out.println("Writing " + 
embeddedFilename);
                                                FileOutputStream fos = new 
FileOutputStream(file);
                                                
                                                
fos.write(embeddedFile.getByteArray());
                                                fos.close();
                                        }
                                }
                        } finally {
                                if (document != null) {
                                        document.close();
                                }
                        }
                }
        }

        /**
         * This will print the usage for this program.
         */
        private static void usage() {
                System.err.println("Usage: java "
                                + PDFEmbeddedFiles.class.getName() + " 
<input-pdf>");
        }
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to