[ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---------------------------

    Description: 
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another (see attachments for 
working/non-working files), source code reproducing the issue has been attached 
as well.



  was:
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another. Below is the test code the 
replicates the issue.

PDF that has an attachment that gets corrupted will be attached to the issue.




public class PDFEmbeddedFiles {

        private PDFEmbeddedFiles() {
        }

        public static void main(String[] args) throws Exception {

                if (args.length != 1) {
                        usage();
                        System.exit(1);
                } else {

                        PDDocument document = null;

                        try {
                                File pdfFile = new File(args[0]);
                                /*
                                String filePath = pdfFile.getParent()
                                                + 
System.getProperty("file.separator");
                                */
                                document = PDDocument.load(pdfFile);
                                if (document.isEncrypted()) {
                                        try {
                                                document.decrypt("");
                                        } catch (InvalidPasswordException e) {
                                                System.err.println("Error: The 
document is encrypted.");
                                        } catch 
(org.apache.pdfbox.exceptions.CryptographyException e) {
                                                e.printStackTrace();
                                        }
                                }
                                
                                PDDocumentNameDictionary namesDictionary = 
document.getDocumentCatalog().getNames(); //new 
PDDocumentNameDictionary(document.getDocumentCatalog());
                                PDEmbeddedFilesNameTreeNode efTree = 
namesDictionary.getEmbeddedFiles();
                                if (efTree != null) {
                                        Map<String, Object> names = 
efTree.getNames();
                                        Iterator<String> namesKeys = 
names.keySet().iterator();
                                        while (namesKeys.hasNext()) {
                                                String filename = 
namesKeys.next();
                                                PDComplexFileSpecification 
fileSpec = (PDComplexFileSpecification) names
                                                                .get(filename);
                                                PDEmbeddedFile embeddedFile = 
fileSpec
                                                                
.getEmbeddedFile();
                                                String embeddedFilename = 
filename;//filePath + filename;
                                                File file = new 
File(filename);//filePath + filename);
                                                System.out.println("Writing " + 
embeddedFilename);
                                                FileOutputStream fos = new 
FileOutputStream(file);
                                                
                                                
fos.write(embeddedFile.getByteArray());
                                                fos.close();
                                        }
                                }
                        } finally {
                                if (document != null) {
                                        document.close();
                                }
                        }
                }
        }

        /**
         * This will print the usage for this program.
         */
        private static void usage() {
                System.err.println("Usage: java "
                                + PDFEmbeddedFiles.class.getName() + " 
<input-pdf>");
        }
}


    
> PDF gets corrupted when extracting it from the embedded files
> -------------------------------------------------------------
>
>                 Key: PDFBOX-1510
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1510
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: Andriy
>            Priority: Critical
>         Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another (see attachments for 
> working/non-working files), source code reproducing the issue has been 
> attached as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to