[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
[ https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296534#comment-17296534 ] Andriy commented on PDFBOX-5115: [~tilman] the reason I ask about 127 controlDEL bullet that i found no way to filter out the text used for PDF genearion or check it not to get exception. the way i though about was {quote}WinAnsiEncoding.INSTANCE.contains() {quote} but it will not work for 7F and i will get runtime exception. Can you sugests how to check the text before pdf generation not to get exception ? > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > > Key: PDFBOX-5115 > URL: https://issues.apache.org/jira/browse/PDFBOX-5115 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.22 >Reporter: Andriy >Priority: Minor > Fix For: 2.0.23, 3.0.0 PDFBox > > > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > this symbol U+00AD are in WinAnsiEncoding by the code but the slightly > different name > > {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { > // adding some additional mappings as defined in Appendix D of the pdf spec > ... > \{0255, "hyphen"} > {quote} > > it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
[ https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296328#comment-17296328 ] Andriy edited comment on PDFBOX-5115 at 3/5/21, 8:40 PM: - [~tilman] I tried to test all AdobeGlyphList against WinAnsiEncoding and only this 2 symbos throw errors Thank you for explanation 127 controlDEL bullet 173 sfthyphen hyphen for(Map.Entry entry : GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){ int i = Character.codePointAt(entry.getKey(),0); if (WinAnsiEncoding.INSTANCE.contains(i) && !WinAnsiEncoding.INSTANCE.contains(entry.getValue())){ System.out.print(""); System.out.println(String.format(" %s %s %s", i , entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) ); } } was (Author: sandriy): I tried to test all AdobeGlyphList against WinAnsiEncoding and only this 2 symbos throw errors Thank you for explanation 127 controlDEL bullet 173 sfthyphen hyphen for(Map.Entry entry : GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){ int i = Character.codePointAt(entry.getKey(),0); if (WinAnsiEncoding.INSTANCE.contains(i) && !WinAnsiEncoding.INSTANCE.contains(entry.getValue())){ System.out.print(""); System.out.println(String.format(" %s %s %s", i , entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) ); } } > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > > Key: PDFBOX-5115 > URL: https://issues.apache.org/jira/browse/PDFBOX-5115 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.22 >Reporter: Andriy >Priority: Minor > > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > this symbol U+00AD are in WinAnsiEncoding by the code but the slightly > different name > > {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { > // adding some additional mappings as defined in Appendix D of the pdf spec > ... > \{0255, "hyphen"} > {quote} > > it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
[ https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296328#comment-17296328 ] Andriy commented on PDFBOX-5115: I tried to test all AdobeGlyphList against WinAnsiEncoding and only this 2 symbos throw errors Thank you for explanation 127 controlDEL bullet 173 sfthyphen hyphen for(Map.Entry entry : GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){ int i = Character.codePointAt(entry.getKey(),0); if (WinAnsiEncoding.INSTANCE.contains(i) && !WinAnsiEncoding.INSTANCE.contains(entry.getValue())){ System.out.print(""); System.out.println(String.format(" %s %s %s", i , entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) ); } } > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > > Key: PDFBOX-5115 > URL: https://issues.apache.org/jira/browse/PDFBOX-5115 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.22 >Reporter: Andriy >Priority: Minor > > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > this symbol U+00AD are in WinAnsiEncoding by the code but the slightly > different name > > {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { > // adding some additional mappings as defined in Appendix D of the pdf spec > ... > \{0255, "hyphen"} > {quote} > > it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
[ https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296294#comment-17296294 ] Andriy edited comment on PDFBOX-5115 at 3/5/21, 7:21 PM: - [~tilman] there is 1 more symbol that may cause error after conversion U+007F ('controlDEL') is not available in this font Times-Roman encoding: WinAnsiEncoding // From the PDF specification: // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet character. was (Author: sandriy): [~tilman] there is 1 more symbol U+007F ('controlDEL') is not available in this font Times-Roman encoding: WinAnsiEncoding // From the PDF specification: // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet character. > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > > Key: PDFBOX-5115 > URL: https://issues.apache.org/jira/browse/PDFBOX-5115 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.22 >Reporter: Andriy >Priority: Minor > > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > this symbol U+00AD are in WinAnsiEncoding by the code but the slightly > different name > > {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { > // adding some additional mappings as defined in Appendix D of the pdf spec > ... > \{0255, "hyphen"} > {quote} > > it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
[ https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296294#comment-17296294 ] Andriy commented on PDFBOX-5115: [~tilman] there is 1 more symbol U+007F ('controlDEL') is not available in this font Times-Roman encoding: WinAnsiEncoding // From the PDF specification: // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet character. > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > > Key: PDFBOX-5115 > URL: https://issues.apache.org/jira/browse/PDFBOX-5115 > Project: PDFBox > Issue Type: Bug >Affects Versions: 2.0.22 >Reporter: Andriy >Priority: Minor > > U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: > WinAnsiEncoding > > this symbol U+00AD are in WinAnsiEncoding by the code but the slightly > different name > > {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { > // adding some additional mappings as defined in Appendix D of the pdf spec > ... > \{0255, "hyphen"} > {quote} > > it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Created] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding
Andriy created PDFBOX-5115: -- Summary: U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding Key: PDFBOX-5115 URL: https://issues.apache.org/jira/browse/PDFBOX-5115 Project: PDFBox Issue Type: Bug Affects Versions: 2.0.22 Reporter: Andriy U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding this symbol U+00AD are in WinAnsiEncoding by the code but the slightly different name {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = { // adding some additional mappings as defined in Appendix D of the pdf spec ... \{0255, "hyphen"} {quote} it is right that both code and name must be equal ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Commented] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241507#comment-14241507 ] Andriy commented on PDFBOX-2551: [~tilman] Thank you for help. Does contain trunk fix of this issue? Where I can find documentation about new API 2.0? > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Attachments: barcode_printing_problem.pdf, print_result.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239497#comment-14239497 ] Andriy commented on PDFBOX-2551: Could it issue depends of text encoding? > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Fix For: 1.8.8 > > Attachments: barcode_printing_problem.pdf, print_result.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239492#comment-14239492 ] Andriy edited comment on PDFBOX-2551 at 12/9/14 2:59 PM: - Input pdf file "barcode_printing_problem.pdf" was (Author: andriy.brez): Input pdf file > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Fix For: 1.8.8 > > Attachments: barcode_printing_problem.pdf, print_result.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239493#comment-14239493 ] Andriy edited comment on PDFBOX-2551 at 12/9/14 2:59 PM: - after pring print_result.pdf was (Author: andriy.brez): after pring > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Fix For: 1.8.8 > > Attachments: barcode_printing_problem.pdf, print_result.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-2551: --- Attachment: print_result.pdf after pring > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Fix For: 1.8.8 > > Attachments: barcode_printing_problem.pdf, print_result.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PDFBOX-2551) Wrong barcode printing for embedded font
[ https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-2551: --- Attachment: barcode_printing_problem.pdf Input pdf file > Wrong barcode printing for embedded font > > > Key: PDFBOX-2551 > URL: https://issues.apache.org/jira/browse/PDFBOX-2551 > Project: PDFBox > Issue Type: Bug > Components: Rendering >Affects Versions: 1.8.7 >Reporter: Andriy > Fix For: 1.8.8 > > Attachments: barcode_printing_problem.pdf > > > Couldn't print file with embedded font "code 128". Code for printing: > PDDocument document = load(new > FileInputStream("barcode_printing_problem.pdf")); > PrinterJob printJob = getPrinterJob(); > printJob.setPrintService(getPrinter("MY_PRINTER)); > document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PDFBOX-2551) Wrong barcode printing for embedded font
Andriy created PDFBOX-2551: -- Summary: Wrong barcode printing for embedded font Key: PDFBOX-2551 URL: https://issues.apache.org/jira/browse/PDFBOX-2551 Project: PDFBox Issue Type: Bug Components: Rendering Affects Versions: 1.8.7 Reporter: Andriy Fix For: 1.8.8 Attachments: barcode_printing_problem.pdf Couldn't print file with embedded font "code 128". Code for printing: PDDocument document = load(new FileInputStream("barcode_printing_problem.pdf")); PrinterJob printJob = getPrinterJob(); printJob.setPrintService(getPrinter("MY_PRINTER)); document.silentPrint(printJob); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572523#comment-13572523 ] Andriy commented on PDFBOX-1510: Thanks a lot Maruan, using PDDocument.loadNonSeq does solve the issue! I will keep the issue ticket open as it still affects PDDocument.load. Also, PDDocument.loadNonSeq can only accept a File in the arguments while we have an InputStream that we want to pass on (have to create a temp file to go over this). > PDF gets corrupted when extracting it from the embedded files > - > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Priority: Critical > Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another (see attachments for > working/non-working files), source code reproducing the issue has been > attached as well. > Please note: the issue is not occurring when using PDDocument.loadNonSeq, > it's on when using PDDocument.load -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-1510: --- Description: When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another (see attachments for working/non-working files), source code reproducing the issue has been attached as well. Please note: the issue is not occurring when using PDDocument.loadNonSeq, it's on when using PDDocument.load was: When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another (see attachments for working/non-working files), source code reproducing the issue has been attached as well. > PDF gets corrupted when extracting it from the embedded files > - > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Priority: Critical > Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another (see attachments for > working/non-working files), source code reproducing the issue has been > attached as well. > Please note: the issue is not occurring when using PDDocument.loadNonSeq, > it's on when using PDDocument.load -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-1510: --- Description: When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another (see attachments for working/non-working files), source code reproducing the issue has been attached as well. was: When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another. Below is the test code the replicates the issue. PDF that has an attachment that gets corrupted will be attached to the issue. public class PDFEmbeddedFiles { private PDFEmbeddedFiles() { } public static void main(String[] args) throws Exception { if (args.length != 1) { usage(); System.exit(1); } else { PDDocument document = null; try { File pdfFile = new File(args[0]); /* String filePath = pdfFile.getParent() + System.getProperty("file.separator"); */ document = PDDocument.load(pdfFile); if (document.isEncrypted()) { try { document.decrypt(""); } catch (InvalidPasswordException e) { System.err.println("Error: The document is encrypted."); } catch (org.apache.pdfbox.exceptions.CryptographyException e) { e.printStackTrace(); } } PDDocumentNameDictionary namesDictionary = document.getDocumentCatalog().getNames(); //new PDDocumentNameDictionary(document.getDocumentCatalog()); PDEmbeddedFilesNameTreeNode efTree = namesDictionary.getEmbeddedFiles(); if (efTree != null) { Map names = efTree.getNames(); Iterator namesKeys = names.keySet().iterator(); while (namesKeys.hasNext()) { String filename = namesKeys.next(); PDComplexFileSpecification fileSpec = (PDComplexFileSpecification) names .get(filename); PDEmbeddedFile embeddedFile = fileSpec .getEmbeddedFile(); String embeddedFilename = filename;//filePath + filename; File file = new File(filename);//filePath + filename); System.out.println("Writing " + embeddedFilename); FileOutputStream fos = new FileOutputStream(file); fos.write(embeddedFile.getByteArray()); fos.close(); } } } finally { if (document != null) { document.close(); } } } } /** * This will print the usage for this program. */ private static void usage() { System.err.println("Usage: java " + PDFEmbeddedFiles.class.getName() + " "); } } > PDF gets corrupted when extracting it from the embedded files > - > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Prior
[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-1510: --- Summary: PDF gets corrupted when extracting it from the embedded files (was: PDF gets corrupted when trying to extract it from the embedded files) > PDF gets corrupted when extracting it from the embedded files > - > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Priority: Critical > Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another. Below is the test code > the replicates the issue. > PDF that has an attachment that gets corrupted will be attached to the issue. > public class PDFEmbeddedFiles { > private PDFEmbeddedFiles() { > } > public static void main(String[] args) throws Exception { > if (args.length != 1) { > usage(); > System.exit(1); > } else { > PDDocument document = null; > try { > File pdfFile = new File(args[0]); > /* > String filePath = pdfFile.getParent() > + > System.getProperty("file.separator"); > */ > document = PDDocument.load(pdfFile); > if (document.isEncrypted()) { > try { > document.decrypt(""); > } catch (InvalidPasswordException e) { > System.err.println("Error: The > document is encrypted."); > } catch > (org.apache.pdfbox.exceptions.CryptographyException e) { > e.printStackTrace(); > } > } > > PDDocumentNameDictionary namesDictionary = > document.getDocumentCatalog().getNames(); //new > PDDocumentNameDictionary(document.getDocumentCatalog()); > PDEmbeddedFilesNameTreeNode efTree = > namesDictionary.getEmbeddedFiles(); > if (efTree != null) { > Map names = > efTree.getNames(); > Iterator namesKeys = > names.keySet().iterator(); > while (namesKeys.hasNext()) { > String filename = > namesKeys.next(); > PDComplexFileSpecification > fileSpec = (PDComplexFileSpecification) names > .get(filename); > PDEmbeddedFile embeddedFile = > fileSpec > > .getEmbeddedFile(); > String embeddedFilename = > filename;//filePath + filename; > File file = new > File(filename);//filePath + filename); > System.out.println("Writing " + > embeddedFilename); > FileOutputStream fos = new > FileOutputStream(file); > > > fos.write(embeddedFile.getByteArray()); > fos.close(); > } > } > } finally { > if (document != null) { > document.close(); > } > } > } > } > /** >* This will print the usage for this program. >*/ > private static void usage() { > System.err.println("Usage: java " > + PDFEmbeddedFiles.class.getName() + " > "); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For mo
[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-1510: --- Attachment: PDFEmbeddedFiles.java Test class to reproduce the issue > PDF gets corrupted when trying to extract it from the embedded files > > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Priority: Critical > Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another. Below is the test code > the replicates the issue. > PDF that has an attachment that gets corrupted will be attached to the issue. > public class PDFEmbeddedFiles { > private PDFEmbeddedFiles() { > } > public static void main(String[] args) throws Exception { > if (args.length != 1) { > usage(); > System.exit(1); > } else { > PDDocument document = null; > try { > File pdfFile = new File(args[0]); > /* > String filePath = pdfFile.getParent() > + > System.getProperty("file.separator"); > */ > document = PDDocument.load(pdfFile); > if (document.isEncrypted()) { > try { > document.decrypt(""); > } catch (InvalidPasswordException e) { > System.err.println("Error: The > document is encrypted."); > } catch > (org.apache.pdfbox.exceptions.CryptographyException e) { > e.printStackTrace(); > } > } > > PDDocumentNameDictionary namesDictionary = > document.getDocumentCatalog().getNames(); //new > PDDocumentNameDictionary(document.getDocumentCatalog()); > PDEmbeddedFilesNameTreeNode efTree = > namesDictionary.getEmbeddedFiles(); > if (efTree != null) { > Map names = > efTree.getNames(); > Iterator namesKeys = > names.keySet().iterator(); > while (namesKeys.hasNext()) { > String filename = > namesKeys.next(); > PDComplexFileSpecification > fileSpec = (PDComplexFileSpecification) names > .get(filename); > PDEmbeddedFile embeddedFile = > fileSpec > > .getEmbeddedFile(); > String embeddedFilename = > filename;//filePath + filename; > File file = new > File(filename);//filePath + filename); > System.out.println("Writing " + > embeddedFilename); > FileOutputStream fos = new > FileOutputStream(file); > > > fos.write(embeddedFile.getByteArray()); > fos.close(); > } > } > } finally { > if (document != null) { > document.close(); > } > } > } > } > /** >* This will print the usage for this program. >*/ > private static void usage() { > System.err.println("Usage: java " > + PDFEmbeddedFiles.class.getName() + " > "); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/
[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files
[ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andriy updated PDFBOX-1510: --- Attachment: doesnt_work.pdf works2.pdf > PDF gets corrupted when trying to extract it from the embedded files > > > Key: PDFBOX-1510 > URL: https://issues.apache.org/jira/browse/PDFBOX-1510 > Project: PDFBox > Issue Type: Bug >Affects Versions: 1.7.1 >Reporter: Andriy >Priority: Critical > Attachments: doesnt_work.pdf, works2.pdf > > > When a PDF is attached to another PDF it gets corrupted when retrieved > through PDEmbeddedFile.getByteArray() method call. For some reason the > returned array has less data than the original file that has been attached to > the PDF. > This affects some of the documents and not another. Below is the test code > the replicates the issue. > PDF that has an attachment that gets corrupted will be attached to the issue. > public class PDFEmbeddedFiles { > private PDFEmbeddedFiles() { > } > public static void main(String[] args) throws Exception { > if (args.length != 1) { > usage(); > System.exit(1); > } else { > PDDocument document = null; > try { > File pdfFile = new File(args[0]); > /* > String filePath = pdfFile.getParent() > + > System.getProperty("file.separator"); > */ > document = PDDocument.load(pdfFile); > if (document.isEncrypted()) { > try { > document.decrypt(""); > } catch (InvalidPasswordException e) { > System.err.println("Error: The > document is encrypted."); > } catch > (org.apache.pdfbox.exceptions.CryptographyException e) { > e.printStackTrace(); > } > } > > PDDocumentNameDictionary namesDictionary = > document.getDocumentCatalog().getNames(); //new > PDDocumentNameDictionary(document.getDocumentCatalog()); > PDEmbeddedFilesNameTreeNode efTree = > namesDictionary.getEmbeddedFiles(); > if (efTree != null) { > Map names = > efTree.getNames(); > Iterator namesKeys = > names.keySet().iterator(); > while (namesKeys.hasNext()) { > String filename = > namesKeys.next(); > PDComplexFileSpecification > fileSpec = (PDComplexFileSpecification) names > .get(filename); > PDEmbeddedFile embeddedFile = > fileSpec > > .getEmbeddedFile(); > String embeddedFilename = > filename;//filePath + filename; > File file = new > File(filename);//filePath + filename); > System.out.println("Writing " + > embeddedFilename); > FileOutputStream fos = new > FileOutputStream(file); > > > fos.write(embeddedFile.getByteArray()); > fos.close(); > } > } > } finally { > if (document != null) { > document.close(); > } > } > } > } > /** >* This will print the usage for this program. >*/ > private static void usage() { > System.err.println("Usage: java " > + PDFEmbeddedFiles.class.getName() + " > "); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files
Andriy created PDFBOX-1510: -- Summary: PDF gets corrupted when trying to extract it from the embedded files Key: PDFBOX-1510 URL: https://issues.apache.org/jira/browse/PDFBOX-1510 Project: PDFBox Issue Type: Bug Affects Versions: 1.7.1 Reporter: Andriy Priority: Critical When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF. This affects some of the documents and not another. Below is the test code the replicates the issue. PDF that has an attachment that gets corrupted will be attached to the issue. public class PDFEmbeddedFiles { private PDFEmbeddedFiles() { } public static void main(String[] args) throws Exception { if (args.length != 1) { usage(); System.exit(1); } else { PDDocument document = null; try { File pdfFile = new File(args[0]); /* String filePath = pdfFile.getParent() + System.getProperty("file.separator"); */ document = PDDocument.load(pdfFile); if (document.isEncrypted()) { try { document.decrypt(""); } catch (InvalidPasswordException e) { System.err.println("Error: The document is encrypted."); } catch (org.apache.pdfbox.exceptions.CryptographyException e) { e.printStackTrace(); } } PDDocumentNameDictionary namesDictionary = document.getDocumentCatalog().getNames(); //new PDDocumentNameDictionary(document.getDocumentCatalog()); PDEmbeddedFilesNameTreeNode efTree = namesDictionary.getEmbeddedFiles(); if (efTree != null) { Map names = efTree.getNames(); Iterator namesKeys = names.keySet().iterator(); while (namesKeys.hasNext()) { String filename = namesKeys.next(); PDComplexFileSpecification fileSpec = (PDComplexFileSpecification) names .get(filename); PDEmbeddedFile embeddedFile = fileSpec .getEmbeddedFile(); String embeddedFilename = filename;//filePath + filename; File file = new File(filename);//filePath + filename); System.out.println("Writing " + embeddedFilename); FileOutputStream fos = new FileOutputStream(file); fos.write(embeddedFile.getByteArray()); fos.close(); } } } finally { if (document != null) { document.close(); } } } } /** * This will print the usage for this program. */ private static void usage() { System.err.println("Usage: java " + PDFEmbeddedFiles.class.getName() + " "); } } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira