[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-06 Thread Andriy (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296534#comment-17296534
 ] 

Andriy commented on PDFBOX-5115:


[~tilman]

the reason I ask about 127 controlDEL bullet that i found no way to filter out 
the text used for PDF genearion or check it not to get exception.

the way i though about was 
{quote}WinAnsiEncoding.INSTANCE.contains()
{quote}
but it will not work for 7F and i will get runtime exception.

Can you sugests how to check the text before pdf generation not to get 
exception ? 

 

> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
> 
>
> Key: PDFBOX-5115
> URL: https://issues.apache.org/jira/browse/PDFBOX-5115
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Andriy
>Priority: Minor
> Fix For: 2.0.23, 3.0.0 PDFBox
>
>
> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
>  
> this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
> different name
>  
> {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {
> // adding some additional mappings as defined in Appendix D of the pdf spec
>  ...
>  \{0255, "hyphen"}
> {quote}
>  
>  it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-05 Thread Andriy (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296328#comment-17296328
 ] 

Andriy edited comment on PDFBOX-5115 at 3/5/21, 8:40 PM:
-

[~tilman] I tried to test all AdobeGlyphList against WinAnsiEncoding and only 
this 2 symbos throw errors

Thank you for explanation

 127 controlDEL bullet
 173 sfthyphen hyphen
 
for(Map.Entry entry : 
GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){
int i =  Character.codePointAt(entry.getKey(),0);
if (WinAnsiEncoding.INSTANCE.contains(i) && 
!WinAnsiEncoding.INSTANCE.contains(entry.getValue())){
System.out.print("");
System.out.println(String.format(" %s %s %s",  i , 
entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) );
}
}




was (Author: sandriy):
I tried to test all AdobeGlyphList against WinAnsiEncoding and only this 2 
symbos throw errors

Thank you for explanation

 127 controlDEL bullet
 173 sfthyphen hyphen
 
for(Map.Entry entry : 
GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){
int i =  Character.codePointAt(entry.getKey(),0);
if (WinAnsiEncoding.INSTANCE.contains(i) && 
!WinAnsiEncoding.INSTANCE.contains(entry.getValue())){
System.out.print("");
System.out.println(String.format(" %s %s %s",  i , 
entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) );
}
}



> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
> 
>
> Key: PDFBOX-5115
> URL: https://issues.apache.org/jira/browse/PDFBOX-5115
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Andriy
>Priority: Minor
>
> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
>  
> this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
> different name
>  
> {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {
> // adding some additional mappings as defined in Appendix D of the pdf spec
>  ...
>  \{0255, "hyphen"}
> {quote}
>  
>  it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-05 Thread Andriy (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296328#comment-17296328
 ] 

Andriy commented on PDFBOX-5115:


I tried to test all AdobeGlyphList against WinAnsiEncoding and only this 2 
symbos throw errors

Thank you for explanation

 127 controlDEL bullet
 173 sfthyphen hyphen
 
for(Map.Entry entry : 
GlyphList.getAdobeGlyphList().unicodeToName.entrySet() ){
int i =  Character.codePointAt(entry.getKey(),0);
if (WinAnsiEncoding.INSTANCE.contains(i) && 
!WinAnsiEncoding.INSTANCE.contains(entry.getValue())){
System.out.print("");
System.out.println(String.format(" %s %s %s",  i , 
entry.getValue() , WinAnsiEncoding.INSTANCE.getName(i)) );
}
}



> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
> 
>
> Key: PDFBOX-5115
> URL: https://issues.apache.org/jira/browse/PDFBOX-5115
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Andriy
>Priority: Minor
>
> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
>  
> this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
> different name
>  
> {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {
> // adding some additional mappings as defined in Appendix D of the pdf spec
>  ...
>  \{0255, "hyphen"}
> {quote}
>  
>  it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-05 Thread Andriy (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296294#comment-17296294
 ] 

Andriy edited comment on PDFBOX-5115 at 3/5/21, 7:21 PM:
-

[~tilman] there is 1 more symbol that may cause error after conversion

 

U+007F ('controlDEL') is not available in this font Times-Roman encoding: 
WinAnsiEncoding

// From the PDF specification:
 // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet 
character.

 


was (Author: sandriy):
[~tilman] there is 1 more symbol

 

U+007F ('controlDEL') is not available in this font Times-Roman encoding: 
WinAnsiEncoding

// From the PDF specification:
 // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet 
character.

 

> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
> 
>
> Key: PDFBOX-5115
> URL: https://issues.apache.org/jira/browse/PDFBOX-5115
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Andriy
>Priority: Minor
>
> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
>  
> this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
> different name
>  
> {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {
> // adding some additional mappings as defined in Appendix D of the pdf spec
>  ...
>  \{0255, "hyphen"}
> {quote}
>  
>  it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-05 Thread Andriy (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296294#comment-17296294
 ] 

Andriy commented on PDFBOX-5115:


[~tilman] there is 1 more symbol

 

U+007F ('controlDEL') is not available in this font Times-Roman encoding: 
WinAnsiEncoding

// From the PDF specification:
 // In WinAnsiEncoding, all unused codes greater than 40 map to the bullet 
character.

 

> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
> 
>
> Key: PDFBOX-5115
> URL: https://issues.apache.org/jira/browse/PDFBOX-5115
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.22
>Reporter: Andriy
>Priority: Minor
>
> U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
> WinAnsiEncoding
>  
> this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
> different name
>  
> {quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {
> // adding some additional mappings as defined in Appendix D of the pdf spec
>  ...
>  \{0255, "hyphen"}
> {quote}
>  
>  it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5115) U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: WinAnsiEncoding

2021-03-05 Thread Andriy (Jira)
Andriy created PDFBOX-5115:
--

 Summary: U+00AD ('sfthyphen') is not available in this font 
Times-Roman encoding: WinAnsiEncoding
 Key: PDFBOX-5115
 URL: https://issues.apache.org/jira/browse/PDFBOX-5115
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 2.0.22
Reporter: Andriy


U+00AD ('sfthyphen') is not available in this font Times-Roman encoding: 
WinAnsiEncoding

 

this symbol U+00AD are in WinAnsiEncoding by the code but the slightly 
different name

 
{quote}private static final Object[][] WIN_ANSI_ENCODING_TABLE = {

// adding some additional mappings as defined in Appendix D of the pdf spec
 ...
 \{0255, "hyphen"}
{quote}
 

 it is right that both code and name must be equal ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-10 Thread Andriy (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241507#comment-14241507
 ] 

Andriy commented on PDFBOX-2551:


[~tilman] Thank you for help. Does contain trunk fix of this issue? Where I can 
find documentation about new API 2.0?

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Attachments: barcode_printing_problem.pdf, print_result.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239497#comment-14239497
 ] 

Andriy commented on PDFBOX-2551:


Could it issue depends of text encoding?

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Fix For: 1.8.8
>
> Attachments: barcode_printing_problem.pdf, print_result.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239492#comment-14239492
 ] 

Andriy edited comment on PDFBOX-2551 at 12/9/14 2:59 PM:
-

Input pdf file "barcode_printing_problem.pdf"


was (Author: andriy.brez):
Input pdf file

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Fix For: 1.8.8
>
> Attachments: barcode_printing_problem.pdf, print_result.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14239493#comment-14239493
 ] 

Andriy edited comment on PDFBOX-2551 at 12/9/14 2:59 PM:
-

after pring print_result.pdf


was (Author: andriy.brez):
after pring

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Fix For: 1.8.8
>
> Attachments: barcode_printing_problem.pdf, print_result.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-2551:
---
Attachment: print_result.pdf

after pring

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Fix For: 1.8.8
>
> Attachments: barcode_printing_problem.pdf, print_result.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-2551:
---
Attachment: barcode_printing_problem.pdf

Input pdf file

> Wrong barcode printing for embedded font
> 
>
> Key: PDFBOX-2551
> URL: https://issues.apache.org/jira/browse/PDFBOX-2551
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 1.8.7
>Reporter: Andriy
> Fix For: 1.8.8
>
> Attachments: barcode_printing_problem.pdf
>
>
> Couldn't print file with embedded font "code 128".  Code for printing:
> PDDocument document = load(new 
> FileInputStream("barcode_printing_problem.pdf"));
> PrinterJob printJob = getPrinterJob();
> printJob.setPrintService(getPrinter("MY_PRINTER));
> document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PDFBOX-2551) Wrong barcode printing for embedded font

2014-12-09 Thread Andriy (JIRA)
Andriy created PDFBOX-2551:
--

 Summary: Wrong barcode printing for embedded font
 Key: PDFBOX-2551
 URL: https://issues.apache.org/jira/browse/PDFBOX-2551
 Project: PDFBox
  Issue Type: Bug
  Components: Rendering
Affects Versions: 1.8.7
Reporter: Andriy
 Fix For: 1.8.8
 Attachments: barcode_printing_problem.pdf

Couldn't print file with embedded font "code 128".  Code for printing:

PDDocument document = load(new FileInputStream("barcode_printing_problem.pdf"));
PrinterJob printJob = getPrinterJob();
printJob.setPrintService(getPrinter("MY_PRINTER));
document.silentPrint(printJob);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

2013-02-06 Thread Andriy (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572523#comment-13572523
 ] 

Andriy commented on PDFBOX-1510:


Thanks a lot Maruan, using PDDocument.loadNonSeq does solve the issue! I will 
keep the issue ticket open as it still affects PDDocument.load. Also, 
PDDocument.loadNonSeq can only accept a File in the arguments while we have an 
InputStream that we want to pass on (have to create a temp file to go over 
this).

> PDF gets corrupted when extracting it from the embedded files
> -
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Priority: Critical
> Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another (see attachments for 
> working/non-working files), source code reproducing the issue has been 
> attached as well.
> Please note: the issue is not occurring when using PDDocument.loadNonSeq, 
> it's on when using PDDocument.load

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

2013-02-06 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---

Description: 
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another (see attachments for 
working/non-working files), source code reproducing the issue has been attached 
as well.

Please note: the issue is not occurring when using PDDocument.loadNonSeq, it's 
on when using PDDocument.load

  was:
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another (see attachments for 
working/non-working files), source code reproducing the issue has been attached 
as well.




> PDF gets corrupted when extracting it from the embedded files
> -
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Priority: Critical
> Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another (see attachments for 
> working/non-working files), source code reproducing the issue has been 
> attached as well.
> Please note: the issue is not occurring when using PDDocument.loadNonSeq, 
> it's on when using PDDocument.load

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

2013-02-06 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---

Description: 
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another (see attachments for 
working/non-working files), source code reproducing the issue has been attached 
as well.



  was:
When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another. Below is the test code the 
replicates the issue.

PDF that has an attachment that gets corrupted will be attached to the issue.




public class PDFEmbeddedFiles {

private PDFEmbeddedFiles() {
}

public static void main(String[] args) throws Exception {

if (args.length != 1) {
usage();
System.exit(1);
} else {

PDDocument document = null;

try {
File pdfFile = new File(args[0]);
/*
String filePath = pdfFile.getParent()
+ 
System.getProperty("file.separator");
*/
document = PDDocument.load(pdfFile);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch (InvalidPasswordException e) {
System.err.println("Error: The 
document is encrypted.");
} catch 
(org.apache.pdfbox.exceptions.CryptographyException e) {
e.printStackTrace();
}
}

PDDocumentNameDictionary namesDictionary = 
document.getDocumentCatalog().getNames(); //new 
PDDocumentNameDictionary(document.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode efTree = 
namesDictionary.getEmbeddedFiles();
if (efTree != null) {
Map names = 
efTree.getNames();
Iterator namesKeys = 
names.keySet().iterator();
while (namesKeys.hasNext()) {
String filename = 
namesKeys.next();
PDComplexFileSpecification 
fileSpec = (PDComplexFileSpecification) names
.get(filename);
PDEmbeddedFile embeddedFile = 
fileSpec

.getEmbeddedFile();
String embeddedFilename = 
filename;//filePath + filename;
File file = new 
File(filename);//filePath + filename);
System.out.println("Writing " + 
embeddedFilename);
FileOutputStream fos = new 
FileOutputStream(file);


fos.write(embeddedFile.getByteArray());
fos.close();
}
}
} finally {
if (document != null) {
document.close();
}
}
}
}

/**
 * This will print the usage for this program.
 */
private static void usage() {
System.err.println("Usage: java "
+ PDFEmbeddedFiles.class.getName() + " 
");
}
}



> PDF gets corrupted when extracting it from the embedded files
> -
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Prior

[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

2013-02-06 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---

Summary: PDF gets corrupted when extracting it from the embedded files  
(was: PDF gets corrupted when trying to extract it from the embedded files)

> PDF gets corrupted when extracting it from the embedded files
> -
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Priority: Critical
> Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another. Below is the test code 
> the replicates the issue.
> PDF that has an attachment that gets corrupted will be attached to the issue.
> public class PDFEmbeddedFiles {
>   private PDFEmbeddedFiles() {
>   }
>   public static void main(String[] args) throws Exception {
>   if (args.length != 1) {
>   usage();
>   System.exit(1);
>   } else {
>   PDDocument document = null;
>   try {
>   File pdfFile = new File(args[0]);
>   /*
>   String filePath = pdfFile.getParent()
>   + 
> System.getProperty("file.separator");
>   */
>   document = PDDocument.load(pdfFile);
>   if (document.isEncrypted()) {
>   try {
>   document.decrypt("");
>   } catch (InvalidPasswordException e) {
>   System.err.println("Error: The 
> document is encrypted.");
>   } catch 
> (org.apache.pdfbox.exceptions.CryptographyException e) {
>   e.printStackTrace();
>   }
>   }
>   
>   PDDocumentNameDictionary namesDictionary = 
> document.getDocumentCatalog().getNames(); //new 
> PDDocumentNameDictionary(document.getDocumentCatalog());
>   PDEmbeddedFilesNameTreeNode efTree = 
> namesDictionary.getEmbeddedFiles();
>   if (efTree != null) {
>   Map names = 
> efTree.getNames();
>   Iterator namesKeys = 
> names.keySet().iterator();
>   while (namesKeys.hasNext()) {
>   String filename = 
> namesKeys.next();
>   PDComplexFileSpecification 
> fileSpec = (PDComplexFileSpecification) names
>   .get(filename);
>   PDEmbeddedFile embeddedFile = 
> fileSpec
>   
> .getEmbeddedFile();
>   String embeddedFilename = 
> filename;//filePath + filename;
>   File file = new 
> File(filename);//filePath + filename);
>   System.out.println("Writing " + 
> embeddedFilename);
>   FileOutputStream fos = new 
> FileOutputStream(file);
>   
>   
> fos.write(embeddedFile.getByteArray());
>   fos.close();
>   }
>   }
>   } finally {
>   if (document != null) {
>   document.close();
>   }
>   }
>   }
>   }
>   /**
>* This will print the usage for this program.
>*/
>   private static void usage() {
>   System.err.println("Usage: java "
>   + PDFEmbeddedFiles.class.getName() + " 
> ");
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For mo

[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files

2013-02-06 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---

Attachment: PDFEmbeddedFiles.java

Test class to reproduce the issue

> PDF gets corrupted when trying to extract it from the embedded files
> 
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Priority: Critical
> Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another. Below is the test code 
> the replicates the issue.
> PDF that has an attachment that gets corrupted will be attached to the issue.
> public class PDFEmbeddedFiles {
>   private PDFEmbeddedFiles() {
>   }
>   public static void main(String[] args) throws Exception {
>   if (args.length != 1) {
>   usage();
>   System.exit(1);
>   } else {
>   PDDocument document = null;
>   try {
>   File pdfFile = new File(args[0]);
>   /*
>   String filePath = pdfFile.getParent()
>   + 
> System.getProperty("file.separator");
>   */
>   document = PDDocument.load(pdfFile);
>   if (document.isEncrypted()) {
>   try {
>   document.decrypt("");
>   } catch (InvalidPasswordException e) {
>   System.err.println("Error: The 
> document is encrypted.");
>   } catch 
> (org.apache.pdfbox.exceptions.CryptographyException e) {
>   e.printStackTrace();
>   }
>   }
>   
>   PDDocumentNameDictionary namesDictionary = 
> document.getDocumentCatalog().getNames(); //new 
> PDDocumentNameDictionary(document.getDocumentCatalog());
>   PDEmbeddedFilesNameTreeNode efTree = 
> namesDictionary.getEmbeddedFiles();
>   if (efTree != null) {
>   Map names = 
> efTree.getNames();
>   Iterator namesKeys = 
> names.keySet().iterator();
>   while (namesKeys.hasNext()) {
>   String filename = 
> namesKeys.next();
>   PDComplexFileSpecification 
> fileSpec = (PDComplexFileSpecification) names
>   .get(filename);
>   PDEmbeddedFile embeddedFile = 
> fileSpec
>   
> .getEmbeddedFile();
>   String embeddedFilename = 
> filename;//filePath + filename;
>   File file = new 
> File(filename);//filePath + filename);
>   System.out.println("Writing " + 
> embeddedFilename);
>   FileOutputStream fos = new 
> FileOutputStream(file);
>   
>   
> fos.write(embeddedFile.getByteArray());
>   fos.close();
>   }
>   }
>   } finally {
>   if (document != null) {
>   document.close();
>   }
>   }
>   }
>   }
>   /**
>* This will print the usage for this program.
>*/
>   private static void usage() {
>   System.err.println("Usage: java "
>   + PDFEmbeddedFiles.class.getName() + " 
> ");
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/

[jira] [Updated] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files

2013-02-05 Thread Andriy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andriy updated PDFBOX-1510:
---

Attachment: doesnt_work.pdf
works2.pdf

> PDF gets corrupted when trying to extract it from the embedded files
> 
>
> Key: PDFBOX-1510
> URL: https://issues.apache.org/jira/browse/PDFBOX-1510
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Andriy
>Priority: Critical
> Attachments: doesnt_work.pdf, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved 
> through PDEmbeddedFile.getByteArray() method call. For some reason the 
> returned array has less data than the original file that has been attached to 
> the PDF.
> This affects some of the documents and not another. Below is the test code 
> the replicates the issue.
> PDF that has an attachment that gets corrupted will be attached to the issue.
> public class PDFEmbeddedFiles {
>   private PDFEmbeddedFiles() {
>   }
>   public static void main(String[] args) throws Exception {
>   if (args.length != 1) {
>   usage();
>   System.exit(1);
>   } else {
>   PDDocument document = null;
>   try {
>   File pdfFile = new File(args[0]);
>   /*
>   String filePath = pdfFile.getParent()
>   + 
> System.getProperty("file.separator");
>   */
>   document = PDDocument.load(pdfFile);
>   if (document.isEncrypted()) {
>   try {
>   document.decrypt("");
>   } catch (InvalidPasswordException e) {
>   System.err.println("Error: The 
> document is encrypted.");
>   } catch 
> (org.apache.pdfbox.exceptions.CryptographyException e) {
>   e.printStackTrace();
>   }
>   }
>   
>   PDDocumentNameDictionary namesDictionary = 
> document.getDocumentCatalog().getNames(); //new 
> PDDocumentNameDictionary(document.getDocumentCatalog());
>   PDEmbeddedFilesNameTreeNode efTree = 
> namesDictionary.getEmbeddedFiles();
>   if (efTree != null) {
>   Map names = 
> efTree.getNames();
>   Iterator namesKeys = 
> names.keySet().iterator();
>   while (namesKeys.hasNext()) {
>   String filename = 
> namesKeys.next();
>   PDComplexFileSpecification 
> fileSpec = (PDComplexFileSpecification) names
>   .get(filename);
>   PDEmbeddedFile embeddedFile = 
> fileSpec
>   
> .getEmbeddedFile();
>   String embeddedFilename = 
> filename;//filePath + filename;
>   File file = new 
> File(filename);//filePath + filename);
>   System.out.println("Writing " + 
> embeddedFilename);
>   FileOutputStream fos = new 
> FileOutputStream(file);
>   
>   
> fos.write(embeddedFile.getByteArray());
>   fos.close();
>   }
>   }
>   } finally {
>   if (document != null) {
>   document.close();
>   }
>   }
>   }
>   }
>   /**
>* This will print the usage for this program.
>*/
>   private static void usage() {
>   System.err.println("Usage: java "
>   + PDFEmbeddedFiles.class.getName() + " 
> ");
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PDFBOX-1510) PDF gets corrupted when trying to extract it from the embedded files

2013-02-05 Thread Andriy (JIRA)
Andriy created PDFBOX-1510:
--

 Summary: PDF gets corrupted when trying to extract it from the 
embedded files
 Key: PDFBOX-1510
 URL: https://issues.apache.org/jira/browse/PDFBOX-1510
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter: Andriy
Priority: Critical


When a PDF is attached to another PDF it gets corrupted when retrieved through 
PDEmbeddedFile.getByteArray() method call. For some reason the returned array 
has less data than the original file that has been attached to the PDF.

This affects some of the documents and not another. Below is the test code the 
replicates the issue.

PDF that has an attachment that gets corrupted will be attached to the issue.




public class PDFEmbeddedFiles {

private PDFEmbeddedFiles() {
}

public static void main(String[] args) throws Exception {

if (args.length != 1) {
usage();
System.exit(1);
} else {

PDDocument document = null;

try {
File pdfFile = new File(args[0]);
/*
String filePath = pdfFile.getParent()
+ 
System.getProperty("file.separator");
*/
document = PDDocument.load(pdfFile);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch (InvalidPasswordException e) {
System.err.println("Error: The 
document is encrypted.");
} catch 
(org.apache.pdfbox.exceptions.CryptographyException e) {
e.printStackTrace();
}
}

PDDocumentNameDictionary namesDictionary = 
document.getDocumentCatalog().getNames(); //new 
PDDocumentNameDictionary(document.getDocumentCatalog());
PDEmbeddedFilesNameTreeNode efTree = 
namesDictionary.getEmbeddedFiles();
if (efTree != null) {
Map names = 
efTree.getNames();
Iterator namesKeys = 
names.keySet().iterator();
while (namesKeys.hasNext()) {
String filename = 
namesKeys.next();
PDComplexFileSpecification 
fileSpec = (PDComplexFileSpecification) names
.get(filename);
PDEmbeddedFile embeddedFile = 
fileSpec

.getEmbeddedFile();
String embeddedFilename = 
filename;//filePath + filename;
File file = new 
File(filename);//filePath + filename);
System.out.println("Writing " + 
embeddedFilename);
FileOutputStream fos = new 
FileOutputStream(file);


fos.write(embeddedFile.getByteArray());
fos.close();
}
}
} finally {
if (document != null) {
document.close();
}
}
}
}

/**
 * This will print the usage for this program.
 */
private static void usage() {
System.err.println("Usage: java "
+ PDFEmbeddedFiles.class.getName() + " 
");
}
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira