[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andriy updated PDFBOX-1510: --------------------------- Attachment: doesnt_work.pdf works2.pdf > PDF gets corrupted when trying to extract it from the embedded files > -------------------------------------------------------------------- > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug > Affects Versions: 1.7.1 > Reporter: Andriy > Priority: Critical > Attachments: doesnt_work.pdf, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another. Below is the test code > the replicates the issue. > PDF that has an attachment that gets corrupted will be attached to the issue. > public class PDFEmbeddedFiles { > private PDFEmbeddedFiles() { > } > public static void main(String[] args) throws Exception { > if (args.length != 1) { > usage(); > System.exit(1); > } else { > PDDocument document = null; > try { > File pdfFile = new File(args[0]); > /* > String filePath = pdfFile.getParent() > + > System.getProperty("file.separator"); > */ > document = PDDocument.load(pdfFile); > if (document.isEncrypted()) { > try { > document.decrypt(""); > } catch (InvalidPasswordException e) { > System.err.println("Error: The > document is encrypted."); > } catch > (org.apache.pdfbox.exceptions.CryptographyException e) { > e.printStackTrace(); > } > } > > PDDocumentNameDictionary namesDictionary = > document.getDocumentCatalog().getNames(); //new > PDDocumentNameDictionary(document.getDocumentCatalog()); > PDEmbeddedFilesNameTreeNode efTree = > namesDictionary.getEmbeddedFiles(); > if (efTree != null) { > Map<String, Object> names = > efTree.getNames(); > Iterator<String> namesKeys = > names.keySet().iterator(); > while (namesKeys.hasNext()) { > String filename = > namesKeys.next(); > PDComplexFileSpecification > fileSpec = (PDComplexFileSpecification) names > .get(filename); > PDEmbeddedFile embeddedFile = > fileSpec > > .getEmbeddedFile(); > String embeddedFilename = > filename;//filePath + filename; > File file = new > File(filename);//filePath + filename); > System.out.println("Writing " + > embeddedFilename); > FileOutputStream fos = new > FileOutputStream(file); > > > fos.write(embeddedFile.getByteArray()); > fos.close(); > } > } > } finally { > if (document != null) { > document.close(); > } > } > } > } > /** > * This will print the usage for this program. > */ > private static void usage() { > System.err.println("Usage: java " > + PDFEmbeddedFiles.class.getName() + " > <input-pdf>"); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira