[jira] [Created] (PDFBOX-5829) IOException: Error expected floating point numberactual='-12.-1'

2024-05-26 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5829:
---

 Summary: IOException: Error expected floating point 
numberactual='-12.-1'
 Key: PDFBOX-5829
 URL: https://issues.apache.org/jira/browse/PDFBOX-5829
 Project: PDFBox
  Issue Type: Bug
  Components: Parsing
Affects Versions: 3.0.2 PDFBox, 2.0.31, 4.0.0
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

arjunce closed PDFBOX-5828.
---
Resolution: Invalid

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Trivial
> Attachments: image-2024-05-24-11-59-31-786.png, 
> output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849241#comment-17849241
 ] 

arjunce edited comment on PDFBOX-5828 at 5/24/24 10:01 AM:
---

You are right. I split the original pdf to test this. Is there a way to fix the 
Capheight value before processing the page? And, I was looking for a 
PDFTextstripper label and accidentally clicked on the easyfix. I have updated 
the ticket.

 

Edit #1:  Upon more debugging, I found out that the pdf was corrupted. 
Appreciate the quick response I will close the bug. Thanks [~tilman] 


was (Author: JIRAUSER305579):
You are right. I split the original pdf to test this. Is there a way to fix the 
Capheight value before processing the page? And, I was looking for a 
PDFTextstripper label and accidentally clicked on the easyfix. I have updated 
the ticket 

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Trivial
> Attachments: image-2024-05-24-11-59-31-786.png, 
> output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849237#comment-17849237
 ] 

Tilman Hausherr edited comment on PDFBOX-5828 at 5/24/24 9:59 AM:
--

The sort mode usually works, the problem with this file is that the font has 
huge Ascent and CapHeight values, 
 !image-2024-05-24-11-59-31-786.png! 
so PDFBox thinks that these glyphs are all on the same line. A look at 
{{TextPositionComparator}} shows that y is considered.
 !screenshot-1.png! 

You added the "easyfix" label. Feel free to attach a patch and I'll test it.

Your file also has another problem, one of the font directories points to an 
incorrect object. This is likely because of a nasty PDFBox bug that was fixed 
in 3.0.2 or will be fixed in 3.0.3. I assume you ran a split on the original 
file.


was (Author: tilman):
The sort mode usually works, the problem with this file is that the font has 
huge Ascent and CapHeight values, so PDFBox thinks that these glyphs are all on 
the same line. A look at {{TextPositionComparator}} shows that y is considered.
 !screenshot-1.png! 

You added the "easyfix" label. Feel free to attach a patch and I'll test it.

Your file also has another problem, one of the font directories points to an 
incorrect object. This is likely because of a nasty PDFBox bug that was fixed 
in 3.0.2 or will be fixed in 3.0.3. I assume you ran a split on the original 
file.

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Trivial
> Attachments: image-2024-05-24-11-59-31-786.png, 
> output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

arjunce updated PDFBOX-5828:

Priority: Trivial  (was: Major)

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Trivial
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

arjunce updated PDFBOX-5828:

Labels:   (was: easyfix)

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Major
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849241#comment-17849241
 ] 

arjunce commented on PDFBOX-5828:
-

You are right. I split the original pdf to test this. Is there a way to fix the 
Capheight value before processing the page? And, I was looking for a 
PDFTextstripper label and accidentally clicked on the easyfix. I have updated 
the ticket 

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Major
>  Labels: easyfix
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5828:

Affects Version/s: 3.0.2 PDFBox

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox, 3.0.2 PDFBox
>Reporter: arjunce
>Priority: Major
>  Labels: easyfix
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849237#comment-17849237
 ] 

Tilman Hausherr commented on PDFBOX-5828:
-

The sort mode usually works, the problem with this file is that the font has 
huge Ascent and CapHeight values, so PDFBox thinks that these glyphs are all on 
the same line. A look at {{TextPositionComparator}} shows that y is considered.
 !screenshot-1.png! 

You added the "easyfix" label. Feel free to attach a patch and I'll test it.

Your file also has another problem, one of the font directories points to an 
incorrect object. This is likely because of a nasty PDFBox bug that was fixed 
in 3.0.2 or will be fixed in 3.0.3. I assume you ran a split on the original 
file.

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox
>Reporter: arjunce
>Priority: Major
>  Labels: easyfix
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5828:

Attachment: screenshot-1.png

> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox
>Reporter: arjunce
>Priority: Major
>  Labels: easyfix
> Attachments: output_text_stripper.txt, screenshot-1.png, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

arjunce updated PDFBOX-5828:

Description: 
Hello Folks, 

I am using pdfbox to extract and manipulate text contents of the pdf and using 
PDFTextStripper to extract text. I am also setting the below options:
{code:java}
PDDocument document = Loader.loadPDF(new 
File("src/test/resources/pdf/test.pdf"));
PDFTextStripper textStripper = new PDFTextStripper();
textStripper.setSortByPosition(true);
textStripper.setWordSeparator(" "); {code}
The Textcomparator is not transitive as mentioned in a comment. The custom 
merge sort implemented is messing up the text at the Individual character level 
and I can't fix the text later. 

I have attached the sample pdf and its text output below. The merge sort 
doesn't consider the y coordinates and x coordinates when sorting the letters. 
Adding that while sorting would fix this issue.

  was:
Hello Folks, 

I am using pdfbox to extract and manipulate text contents of the pdf and using 
PDFTextStripper to extract text. I am also setting the below options:
{code:java}
PDDocument document = Loader.loadPDF(new 
File("src/test/resources/pdf/test.pdf"));
PDFTextStripper textStripper = new PDFTextStripper();
textStripper.setSortByPosition(true);
textStripper.setWordSeparator(" "); {code}
When sorting the positions there is a "{*}Comparison method violates its 
general contract!{*}"

The exception is possibly due to the float values. The custom merge sort 
implemented is messing up the text at the Individual character level and I 
can't fix the text later. 

I have attached the sample pdf and its text output below. The merge sort 
doesn't consider the y coordinates and x coordinates when sorting the letters. 
Adding that while sorting would fix this issue.


> PDFTextStripper created garbled text
> 
>
> Key: PDFBOX-5828
> URL: https://issues.apache.org/jira/browse/PDFBOX-5828
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 3.0.1 PDFBox
>Reporter: arjunce
>Priority: Major
>  Labels: easyfix
> Attachments: output_text_stripper.txt, test.pdf
>
>
> Hello Folks, 
> I am using pdfbox to extract and manipulate text contents of the pdf and 
> using PDFTextStripper to extract text. I am also setting the below options:
> {code:java}
> PDDocument document = Loader.loadPDF(new 
> File("src/test/resources/pdf/test.pdf"));
> PDFTextStripper textStripper = new PDFTextStripper();
> textStripper.setSortByPosition(true);
> textStripper.setWordSeparator(" "); {code}
> The Textcomparator is not transitive as mentioned in a comment. The custom 
> merge sort implemented is messing up the text at the Individual character 
> level and I can't fix the text later. 
> I have attached the sample pdf and its text output below. The merge sort 
> doesn't consider the y coordinates and x coordinates when sorting the 
> letters. Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5828) PDFTextStripper created garbled text

2024-05-24 Thread arjunce (Jira)
arjunce created PDFBOX-5828:
---

 Summary: PDFTextStripper created garbled text
 Key: PDFBOX-5828
 URL: https://issues.apache.org/jira/browse/PDFBOX-5828
 Project: PDFBox
  Issue Type: Bug
  Components: Text extraction
Affects Versions: 3.0.1 PDFBox
Reporter: arjunce
 Attachments: output_text_stripper.txt, test.pdf

Hello Folks, 

I am using pdfbox to extract and manipulate text contents of the pdf and using 
PDFTextStripper to extract text. I am also setting the below options:
{code:java}
PDDocument document = Loader.loadPDF(new 
File("src/test/resources/pdf/test.pdf"));
PDFTextStripper textStripper = new PDFTextStripper();
textStripper.setSortByPosition(true);
textStripper.setWordSeparator(" "); {code}
When sorting the positions there is a "{*}Comparison method violates its 
general contract!{*}"

The exception is possibly due to the float values. The custom merge sort 
implemented is messing up the text at the Individual character level and I 
can't fix the text later. 

I have attached the sample pdf and its text output below. The merge sort 
doesn't consider the y coordinates and x coordinates when sorting the letters. 
Adding that while sorting would fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5823.

Resolution: Fixed

[~thumbox] thanks for the contribution and your feedback

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849070#comment-17849070
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Got it, thanks [~tilman] 

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
> 

[jira] [Comment Edited] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849064#comment-17849064
 ] 

Kabir Soneja edited comment on PDFBOX-5827 at 5/23/24 6:11 PM:
---

Sounds good, thanks for helping with this [~tilman] 
How can I track if the latest version is released or not?


was (Author: JIRAUSER302151):
Sounds good, thanks for helping with this [~tilman] 

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849064#comment-17849064
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Sounds good, thanks for helping with this [~tilman] 

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
> 

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849062#comment-17849062
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

No, there are no fixed schedules. Be patient.

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
> 

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849061#comment-17849061
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Thanks [~tilman] Is there an approximate month or timeline when the version 
2.0.32 will be released?

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(Gly

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849060#comment-17849060
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

The changes will be in version 2.0.32 when released (probably in a few months). 
Your version 2.0.27 is the past. The snapshot is for you to test or to use if 
you can't wait until 2.0.32.

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849054#comment-17849054
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Got it [~tilman] makes sense. Will this snapshot version be realized or be 
available in version 2.0.27 that I am using?

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(Gly

[jira] [Resolved] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5827.
-
Resolution: Fixed

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(Glyp

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849053#comment-17849053
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

Thanks for the feedback. Yes these exceptions will still happen because your 
file has these problems, but PDFBox usually tries to work around these 
problems, sometimes using a different font.

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java

[jira] [Comment Edited] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849052#comment-17849052
 ] 

Kabir Soneja edited comment on PDFBOX-5827 at 5/23/24 5:43 PM:
---

Thanks [~tilman] really appreciate your quick response on this. I just tried 
with the jar you mentioned and it works now. The pdf is converted to an image 
but I still see the exception but it is not fatal
{code:java}
May 23, 2024 10:40:58 AM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:153) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:171)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:959)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:532)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:507)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:151)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:292) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:272)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:258)
 at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:264) at 
org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:93)
May 23, 2024 10:40:58 AM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:153) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:171)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:959)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:532)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:507)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:151)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:292) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:272)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:258)
 at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.ja

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849052#comment-17849052
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Thanks [~tilman] just tried with the jar you mentioned and it works now. The 
pdf is converted to an image but I still see the exception but it is not fatal


{code:java}
May 23, 2024 10:40:58 AM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:153) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:171)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:959)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:532)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:507)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:151)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:292) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:272)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:258)
 at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:264) at 
org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:93)
May 23, 2024 10:40:58 AM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:153) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:171)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:959)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:532)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:507)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:151)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:292) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:272)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:258)
 at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:264) at 
org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:93)
May 23, 2024 10:40:58 AM org.apache.fontbox.ttf.GlyfCompositeDe

[jira] [Updated] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5827:

Labels: StackOverflowError  (was: )

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.a

[jira] [Assigned] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-5827:
---

Assignee: Tilman Hausherr

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Assignee: Tilman Hausherr
>Priority: Major
>  Labels: StackOverflowError
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(Glyp

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849045#comment-17849045
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

Please download the jar at the bottom of 
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.32-SNAPSHOT/
and try that one with your file(s).

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompo

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849004#comment-17849004
 ] 

ASF subversion and git services commented on PDFBOX-5827:
-

Commit 1917926 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917926 ]

PDFBOX-5827: avoid stack overflow by tracking the depth up to 
maxp.MaxComponentDepth

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompo

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849005#comment-17849005
 ] 

ASF subversion and git services commented on PDFBOX-5827:
-

Commit 1917927 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917927 ]

PDFBOX-5827: avoid stack overflow by tracking the depth up to 
maxp.MaxComponentDepth

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompo

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849003#comment-17849003
 ] 

ASF subversion and git services commented on PDFBOX-5827:
-

Commit 1917925 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917925 ]

PDFBOX-5827: avoid stack overflow by tracking the depth up to 
maxp.MaxComponentDepth

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompo

[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-23 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848978#comment-17848978
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

[~lehmi] thanks! this alternative solves the memory issue.

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848957#comment-17848957
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

It turns out that the "multiple changes" are easier than I thought, and the API 
doesn't change because all calls are package scoped. I'm gonna commit the 
changes soon.

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.

[jira] [Updated] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5827:

Affects Version/s: 3.0.2 PDFBox

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> or

[jira] [Updated] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5827:

Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31, 3.0.2 PDFBox
>Reporter: Kabir Soneja
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apac

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848857#comment-17848857
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

The likely cause is a composite glyph that goes into an endless recursion, e.g. 
because of a self-reference. The good news is that I have a font file with the 
same problem, the bad news is that protecting against this problem will require 
multiple changes in fontbox (gather {{maxComponentDepth}} and keep the current 
depth in 5 calls). And even after that, the file might still fail, or look bad.

https://learn.microsoft.com/en-us/typography/opentype/spec/maxp


> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.a

[jira] [Updated] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5827:

Affects Version/s: 2.0.31
   2.0.27

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.27, 2.0.31
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(

[jira] [Updated] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-23 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5827:

Component/s: FontBox

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 2.0.27, 2.0.31
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptio

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-22 Thread Kabir Soneja (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848732#comment-17848732
 ] 

Kabir Soneja commented on PDFBOX-5827:
--

Thanks for helping with this [~tilman].

I tried without fontbox and with fontbox 2.0.31 but still running into the same 
exception. Unfortunately, I won't be able to share the PDF. Sharing the full 
exception and stack trace:
Here is the full exception:


{code:java}
java -cp "pdfbox-app-2.0.27.jar:lib/*" org.apache.pdfbox.tools.PDFBox 
PDFToImage fontbox-exception-pdf.pdf -imageType png -dpi 100
May 22, 2024 12:52:38 PM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:150) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:966)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:541)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:284) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:272)
at 
org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:258)
 at org.apache.pdfbox.tools.PDFToImage.main(PDFToImage.java:262) at 
org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:89)
May 22, 2024 12:52:38 PM org.apache.fontbox.ttf.OS2WindowsMetricsTable 
readWARNING: Could not read all expected parts of version >= 1, setting version 
to 0java.io.EOFException at 
org.apache.fontbox.ttf.TTFDataStream.readUnsignedInt(TTFDataStream.java:150) at 
org.apache.fontbox.ttf.OS2WindowsMetricsTable.read(OS2WindowsMetricsTable.java:843)
  at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:361) at 
org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:188) at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:165)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)at 
org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:110)   at 
org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:112)   
 at 
org.apache.pdfbox.pdmodel.font.PDCIDFontType2.(PDCIDFontType2.java:65) at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createDescendantFont(PDFontFactory.java:139)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:203)  
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)  
 at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)  at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:966)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:541)
 at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516)
  at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:284) 
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:355)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFR

[jira] [Comment Edited] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-22 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848708#comment-17848708
 ] 

Tilman Hausherr edited comment on PDFBOX-5827 at 5/22/24 6:36 PM:
--

Please retest with 2.0.31, remove fontbox-1.7.1.jar (use fontbox 2.0.31 too) 
and attach the PDF. You didn't specify what exception but I guess it's a stack 
overflow?


was (Author: tilman):
Please retest with 2.0.31 and attach the PDF. You didn't specify what exception 
but I guess it's a stack overflow?

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript

[jira] [Commented] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-22 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848708#comment-17848708
 ] 

Tilman Hausherr commented on PDFBOX-5827:
-

Please retest with 2.0.31 and attach the PDF. You didn't specify what exception 
but I guess it's a stack overflow?

> Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs
> -
>
> Key: PDFBOX-5827
> URL: https://issues.apache.org/jira/browse/PDFBOX-5827
> Project: PDFBox
>  Issue Type: Bug
>Reporter: Kabir Soneja
>Priority: Major
>
> Hi,
> I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
> PDFs, I am running into an exception while converting PDF to Image. There are 
> multiple exceptions coming from org.apache.fontbox.ttf.
> Sample Exception:
> {code:java}
> at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
>     at 
> org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
>     at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
>     at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
>     at 
> org.a

[jira] [Created] (PDFBOX-5827) Multiple exceptions coming from org.apache.fontbox.ttf for different PDFs

2024-05-22 Thread Kabir Soneja (Jira)
Kabir Soneja created PDFBOX-5827:


 Summary: Multiple exceptions coming from org.apache.fontbox.ttf 
for different PDFs
 Key: PDFBOX-5827
 URL: https://issues.apache.org/jira/browse/PDFBOX-5827
 Project: PDFBox
  Issue Type: Bug
Reporter: Kabir Soneja


Hi,

I am using PDFBox CLI version 2.0.27 to convert PDFs to images. For certain 
PDFs, I am running into an exception while converting PDF to Image. There are 
multiple exceptions coming from org.apache.fontbox.ttf.

Sample Exception:


{code:java}
at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:80)
    at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:65)
    at org.apache.fontbox.ttf.GlyphTable.getGlyphData(GlyphTable.java:219)
    at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199)
    at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
 {code}
 
{code:java}
Stdout: , Stderr: at 
org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTable.java:199) at 
org.apache.fontbox.ttf.GlyfCompositeDescript.initDescriptions(GlyfCompositeDescript.java:292)
 at 
org.apache.fontbox.ttf.GlyfCompositeDescript

[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848613#comment-17848613
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1917898 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917898 ]

PDFBOX-5660: Sonar fix

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848611#comment-17848611
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1917896 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917896 ]

PDFBOX-5660: Sonar fix

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848612#comment-17848612
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1917897 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917897 ]

PDFBOX-5660: Sonar fix

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5826) Missing /Subtype and /Type in Metadata not detected

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848574#comment-17848574
 ] 

ASF subversion and git services commented on PDFBOX-5826:
-

Commit 1917893 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917893 ]

PDFBOX-5826: check for /Type and /Subtype in metadata

> Missing /Subtype and /Type in Metadata not detected
> ---
>
> Key: PDFBOX-5826
> URL: https://issues.apache.org/jira/browse/PDFBOX-5826
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox
>
>
> {{/Subtype /XML /Type /Metadata}} are mandatory in metadata stream dictionary 
> but not detected by preflight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5826) Missing /Subtype and /Type in Metadata not detected

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848573#comment-17848573
 ] 

ASF subversion and git services commented on PDFBOX-5826:
-

Commit 1917892 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917892 ]

PDFBOX-5826: check for /Type and /Subtype in metadata

> Missing /Subtype and /Type in Metadata not detected
> ---
>
> Key: PDFBOX-5826
> URL: https://issues.apache.org/jira/browse/PDFBOX-5826
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox
>
>
> {{/Subtype /XML /Type /Metadata}} are mandatory in metadata stream dictionary 
> but not detected by preflight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5826) Missing /Subtype and /Type in Metadata not detected

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5826:

Description: {{/Subtype /XML /Type /Metadata}} are mandatory in metadata 
stream dictionary but not detected by preflight.  (was: {{/Subtype /XML /Type 
Metadata}} are mandatory in metadata stream dictionary but not detected by 
preflight.)

> Missing /Subtype and /Type in Metadata not detected
> ---
>
> Key: PDFBOX-5826
> URL: https://issues.apache.org/jira/browse/PDFBOX-5826
> Project: PDFBox
>  Issue Type: Bug
>  Components: Preflight
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox
>
>
> {{/Subtype /XML /Type /Metadata}} are mandatory in metadata stream dictionary 
> but not detected by preflight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5826) Missing /Subtype and /Type in Metadata not detected

2024-05-22 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5826:
---

 Summary: Missing /Subtype and /Type in Metadata not detected
 Key: PDFBOX-5826
 URL: https://issues.apache.org/jira/browse/PDFBOX-5826
 Project: PDFBox
  Issue Type: Bug
  Components: Preflight
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox


{{/Subtype /XML /Type Metadata}} are mandatory in metadata stream dictionary 
but not detected by preflight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5825:

Component/s: Utilities

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them. Another problem (only for 
> PDF/A-1b in versions 3 and trunk) is that XRef isn't supported, so it should 
> be disabled in the merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5825.
-
Resolution: Fixed

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them. Another problem (only for 
> PDF/A-1b in versions 3 and trunk) is that XRef isn't supported, so it should 
> be disabled in the merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5825:

Description: The file created with the example from PDFBOX-3329 fails to 
validate with VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) 
in the /Metadata stream dictionary are missing. This is because the third 
constructor of {{PDMetadata}} doesn't add them. Another problem (only for 
PDF/A-1b in versions 3 and trunk) is that XRef isn't supported, so it should be 
disabled in the merge.  (was: The file created with the example from 
PDFBOX-3329 fails to validate with VeraPDF because two elements ({{/Subtype 
/XML /Type Metadata}}) in the /Metadata stream dictionary are missing. This is 
because the third constructor of {{PDMetadata}} doesn't add them.)

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them. Another problem (only for 
> PDF/A-1b in versions 3 and trunk) is that XRef isn't supported, so it should 
> be disabled in the merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848551#comment-17848551
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917891 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917891 ]

PDFBOX-5825: simplify code

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848550#comment-17848550
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917890 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917890 ]

PDFBOX-5825: simplify code

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848549#comment-17848549
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917889 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917889 ]

PDFBOX-5825: simplify code

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848542#comment-17848542
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917888 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917888 ]

PDFBOX-5825: add missing entries for metadata dictionary; don't compress for 
PDF/A-1b; add test with PDF/A check

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848541#comment-17848541
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917887 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917887 ]

PDFBOX-5825: add missing entries for metadata dictionary; add test with PDF/A 
check

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848540#comment-17848540
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917886 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917886 ]

PDFBOX-5825: add missing entries for metadata dictionary; add test with PDF/A 
check

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848469#comment-17848469
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917880 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917880 ]

PDFBOX-5825: improve javadoc

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848468#comment-17848468
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917879 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917879 ]

PDFBOX-5825: improve javadoc

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848470#comment-17848470
 ] 

ASF subversion and git services commented on PDFBOX-5825:
-

Commit 1917881 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917881 ]

PDFBOX-5825: improve javadoc

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5825:

Description: The file created with the example from PDFBOX-3329 fails to 
validate with VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) 
in the /Metadata stream dictionary are missing. This is because the third 
constructor of {{PDMetadata}} doesn't add them.  (was: The file created with 
the example from PDFBOX-3329 fail to validate with VeraPDF because two elements 
in the /Metadata stream dictionary are missing. This is because the third 
constructor of {{PDMetadata}} doesn't add them.)

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fails to validate with 
> VeraPDF because two elements ({{/Subtype /XML /Type Metadata}}) in the 
> /Metadata stream dictionary are missing. This is because the third 
> constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5825:

Description: The file created with the example from PDFBOX-3329 fail to 
validate with VeraPDF because two elements in the /Metadata stream dictionary 
are missing. This is because the third constructor of {{PDMetadata}} doesn't 
add them.  (was: The file created with the example from PDFBOX-3329 fail to 
validate with VeraPDF because two elements in the /Metadata stream dictionary 
are missing.)

> Files created with PDFMergerExample are not correct PDF/A
> -
>
> Key: PDFBOX-5825
> URL: https://issues.apache.org/jira/browse/PDFBOX-5825
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> The file created with the example from PDFBOX-3329 fail to validate with 
> VeraPDF because two elements in the /Metadata stream dictionary are missing. 
> This is because the third constructor of {{PDMetadata}} doesn't add them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848464#comment-17848464
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


I didn't have to dig too deep to find out that I'm wrong. Every usage of the 
predicate function created a new Matcher object. I've followed [~msahyoun] 
proposal and replaced the predicate with a simplified version of 
StringUtils.isBlank from commons-lang

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5825) Files created with PDFMergerExample are not correct PDF/A

2024-05-22 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5825:
---

 Summary: Files created with PDFMergerExample are not correct PDF/A
 Key: PDFBOX-5825
 URL: https://issues.apache.org/jira/browse/PDFBOX-5825
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
 Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0


The file created with the example from PDFBOX-3329 fail to validate with 
VeraPDF because two elements in the /Metadata stream dictionary are missing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848463#comment-17848463
 ] 

ASF subversion and git services commented on PDFBOX-5823:
-

Commit 1917878 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917878 ]

PDFBOX-5823: replace Predicate to avoid creating new objects with every call

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848392#comment-17848392
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


Looks like I'm missing something. I'm going to have a deeper look

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848357#comment-17848357
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/21/24 7:25 PM:
--

I've attached a profiler screenshot and seems like predicate (even static and 
creating only once) is not a good option. Do you think you can compare in your 
side as well? Please, if you don't mind, have a look at Main-1.java and 
Screenshot 2024-05-21 at 20.21.43.png. Perhaps I'm missing something.


was (Author: JIRAUSER305510):
I've attached a profiler screenshot and seems like predicate (even static and 
creating only once) is not a good option. Do you think you can compare in your 
side as well?

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848357#comment-17848357
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

I've attached a profiler screenshot and seems like predicate (even static and 
creating only once) is not a good option. Do you think you can compare in your 
side as well?

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5823:

Attachment: Main-1.java

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5823:

Attachment: Screenshot 2024-05-21 at 20.21.43.png

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848344#comment-17848344
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


[~thumbox] yes, but the matcher is static and created only once

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848329#comment-17848329
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

[~lehmi] I believe asPredicate() will instantiate a Matcher, that could cause 
the same high memory utilisation. 

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848320#comment-17848320
 ] 

ASF subversion and git services commented on PDFBOX-5823:
-

Commit 1917862 from le...@apache.org in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917862 ]

PDFBOX-5823: simplify pattern matching to optimize memory consumption based on 
a proposal by Jonathan Prates

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848319#comment-17848319
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


Thanks for the proposals but I've found another solution

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848277#comment-17848277
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/21/24 4:22 PM:
--

Agreed, we could copy StringUtils.isBlank() code 
[https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L3623C1-L3634C6]
 or add something like
{code:java}
public boolean isBlank(String s)
{
   return s != null && s.chars().allMatch(Character::isWhitespace);
}{code}
to pdfbox.util.StringUtil


was (Author: JIRAUSER305510):
Agreed, we could copy StringUtils.isBlank() code 
[https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L3623C1-L3634C6]
 or something like
{code:java}
public boolean isBlank(String s)
{
return s != null && s.chars().allMatch(Character::isWhitespace);
}{code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848277#comment-17848277
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

Agree, we could copy StringUtils.isBlank() code 
[https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L3623C1-L3634C6]
 or something like
{code:java}
public boolean isBlank(String s)
{
return s != null && s.chars().allMatch(Character::isWhitespace);
}{code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848277#comment-17848277
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/21/24 4:21 PM:
--

Agreed, we could copy StringUtils.isBlank() code 
[https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L3623C1-L3634C6]
 or something like
{code:java}
public boolean isBlank(String s)
{
return s != null && s.chars().allMatch(Character::isWhitespace);
}{code}


was (Author: JIRAUSER305510):
Agree, we could copy StringUtils.isBlank() code 
[https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L3623C1-L3634C6]
 or something like
{code:java}
public boolean isBlank(String s)
{
return s != null && s.chars().allMatch(Character::isWhitespace);
}{code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848274#comment-17848274
 ] 

Maruan Sahyoun commented on PDFBOX-5823:


What about using Apache Commons Lang StringUtils.isBlank() or copy the code?

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848271#comment-17848271
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


[~thumbox] we need to find another solution for 3.x as String.isBlank() isn't 
available in java8

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848268#comment-17848268
 ] 

ASF subversion and git services commented on PDFBOX-5823:
-

Commit 1917858 from le...@apache.org in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1917858 ]

PDFBOX-5823: simplify pattern matching to optimize memory consumption as 
proposed by Jonathan Prates

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reassigned PDFBOX-5823:
--

Assignee: Andreas Lehmkühler

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-5823:
---
Fix Version/s: 3.0.3 PDFBox
   4.0.0

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848182#comment-17848182
 ] 

Jonathan Prates commented on PDFBOX-5824:
-

Patch used https://github.com/apache/pdfbox/pull/196

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5824:

Attachment: Screenshot 2024-05-21 at 11.00.25.jpg

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.jpg
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5824:

Attachment: (was: Screenshot 2024-05-21 at 11.00.25.png)

> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5824:

Description: 
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
 controls which Map class is used to optimize memory usage. By default, a 
SmallMap is used. However, if the number of items in a COSDictionary reaches 
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
 a LinkedHashMap.

For larger documents, where the COSDictionary is expected to be substantial 
bigger than this limit, this copying occurs frequently. Additionally, 
[SmallMap.keySet is not 
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
 The attached screenshot shows pdfbox performance with SmallMap (in red) versus 
using LinkedHashMap and ignoring the threshold (in green).

*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
property?*

If set to 0, LinkedHashMap would be used. If not set, it would default to the 
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.

  was:
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
 controls which Map class is used to optimize memory usage. By default, a 
SmallMap is used. However, if the number of items in a COSDictionary reaches 
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
 a LinkedHashMap.

For larger documents, where the COSDictionary is expected to be substantial, 
this copying occurs frequently. Additionally, [SmallMap.keySet is not 
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
 The attached screenshot shows pdfbox performance with SmallMap (in red) versus 
using LinkedHashMap and ignoring the threshold (in green).

*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
property?*

If set to 0, LinkedHashMap would be used. If not set, it would default to the 
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.


> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.png
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap and ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5824:

Description: 
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
 controls which Map class is used to optimize memory usage. By default, a 
SmallMap is used. However, if the number of items in a COSDictionary reaches 
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
 a LinkedHashMap.

For larger documents, where the COSDictionary is expected to be substantial 
bigger than this limit, this copying occurs frequently. Additionally, 
[SmallMap.keySet is not 
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
 The attached screenshot shows pdfbox performance with SmallMap (in red) versus 
using LinkedHashMap, ignoring the threshold (in green).

*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
property?*

If set to 0, LinkedHashMap would be used. If not set, it would default to the 
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.

  was:
[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
 controls which Map class is used to optimize memory usage. By default, a 
SmallMap is used. However, if the number of items in a COSDictionary reaches 
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
 a LinkedHashMap.

For larger documents, where the COSDictionary is expected to be substantial 
bigger than this limit, this copying occurs frequently. Additionally, 
[SmallMap.keySet is not 
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
 The attached screenshot shows pdfbox performance with SmallMap (in red) versus 
using LinkedHashMap and ignoring the threshold (in green).

*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
property?*

If set to 0, LinkedHashMap would be used. If not set, it would default to the 
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.


> Allow COSDictionary.MAP_THRESHOLD to be defined as System property
> --
>
> Key: PDFBOX-5824
> URL: https://issues.apache.org/jira/browse/PDFBOX-5824
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox, 4.0.0
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-21 at 11.00.25.png
>
>
> [COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
>  controls which Map class is used to optimize memory usage. By default, a 
> SmallMap is used. However, if the number of items in a COSDictionary reaches 
> the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
> |https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
>  a LinkedHashMap.
> For larger documents, where the COSDictionary is expected to be substantial 
> bigger than this limit, this copying occurs frequently. Additionally, 
> [SmallMap.keySet is not 
> efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
>  The attached screenshot shows pdfbox performance with SmallMap (in red) 
> versus using LinkedHashMap, ignoring the threshold (in green).
> *Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
> property?*
> If set to 0, LinkedHashMap would be used. If not set, it would default to the 
> current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5824) Allow COSDictionary.MAP_THRESHOLD to be defined as System property

2024-05-21 Thread Jonathan Prates (Jira)
Jonathan Prates created PDFBOX-5824:
---

 Summary: Allow COSDictionary.MAP_THRESHOLD to be defined as System 
property
 Key: PDFBOX-5824
 URL: https://issues.apache.org/jira/browse/PDFBOX-5824
 Project: PDFBox
  Issue Type: Improvement
  Components: PDModel
Affects Versions: 3.0.3 PDFBox, 4.0.0
Reporter: Jonathan Prates
 Attachments: Screenshot 2024-05-21 at 11.00.25.png

[COSDictionary.MAP_THRESHOLD|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L54]
 controls which Map class is used to optimize memory usage. By default, a 
SmallMap is used. However, if the number of items in a COSDictionary reaches 
the MAP_THRESHOLD value (hardcoded to 1,000), the references [are copied 
|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/cos/COSDictionary.java#L208]to
 a LinkedHashMap.

For larger documents, where the COSDictionary is expected to be substantial, 
this copying occurs frequently. Additionally, [SmallMap.keySet is not 
efficient|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/util/SmallMap.java#L281].
 The attached screenshot shows pdfbox performance with SmallMap (in red) versus 
using LinkedHashMap and ignoring the threshold (in green).

*Would it be beneficial to allow MAP_THRESHOLD to be defined as a System 
property?*

If set to 0, LinkedHashMap would be used. If not set, it would default to the 
current MAP_THRESHOLD value and SmallMap, not changing the current behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847912#comment-17847912
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/20/24 4:50 PM:
--

[~lehmi] I tested it locally ([https://github.com/apache/pdfbox/pull/195) 
|https://github.com/apache/pdfbox/pull/195]and indeed this approach is way 
better
{code:java}
word.length() == 1 && word.isBlank(); {code}


was (Author: JIRAUSER305510):
[~lehmi] I tested it locally and indeed this approach is way better
{code:java}
word.length() == 1 && word.isBlank(); {code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847912#comment-17847912
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/20/24 4:08 PM:
--

[~lehmi] I tested it locally and indeed this approach is way better
{code:java}
word.length() == 1 && word.isBlank(); {code}


was (Author: JIRAUSER305510):
[~lehmi] I tested it locally and indeed it is way better if \x0B can be ignored
{code:java}
word.length() == 1 && word.isBlank(); {code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847912#comment-17847912
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

[~lehmi] I tested it locally and indeed it is way better if \x0B can be ignored
{code:java}
word.length() == 1 && word.isBlank(); {code}

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847901#comment-17847901
 ] 

Andreas Lehmkühler commented on PDFBOX-5823:


Those tokens either doesn't contain any of that chars or exactly one of them. 
Saying that, it might be a good idea to check only those tokens for "spaces" 
with a length of 1

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847855#comment-17847855
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/20/24 12:44 PM:
---

Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.
{code:java}
var SPACES = Set.of(" ", "\t", "\n", "\r", "\f", " x0B");{code}
Attached I've provided a simple benchmark: [^Main.java]I can see a similar 
pattern on memory allocation for regexp here.

Regarding GC, yes, memory will be cleaned in the next cycle, but since we are 
working in a web environment that has concurrent requests and a limited amount 
of memory per container, I believe less memory allocation can be beneficial.

 


was (Author: JIRAUSER305510):
Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.
{code:java}
var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", " x0B");{code}
Attached I've provided a simple benchmark: [^Main.java]I can see a similar 
pattern on memory allocation for regexp here.

Regarding GC, yes, memory will be cleaned in the next cycle, but since we are 
working in a web environment that has concurrent requests and a limited amount 
of memory per container, I believe less memory allocation can be beneficial.

 

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847855#comment-17847855
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/20/24 12:44 PM:
---

Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.
{code:java}
var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", " x0B");{code}
Attached I've provided a simple benchmark: [^Main.java]I can see a similar 
pattern on memory allocation for regexp here.

Regarding GC, yes, memory will be cleaned in the next cycle, but since we are 
working in a web environment that has concurrent requests and a limited amount 
of memory per container, I believe less memory allocation can be beneficial.

 


was (Author: JIRAUSER305510):
Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.
{code:java}
var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", " x0B");{code}
Attached I've provided a simple benchmark: [^Main.java]

I can see a similar pattern on memory allocation for regexp here.

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847855#comment-17847855
 ] 

Jonathan Prates edited comment on PDFBOX-5823 at 5/20/24 11:26 AM:
---

Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.
{code:java}
var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", " x0B");{code}
Attached I've provided a simple benchmark: [^Main.java]

I can see a similar pattern on memory allocation for regexp here.


was (Author: JIRAUSER305510):
Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.

 
`var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", "\\x0B");`
 
Attached I've provided a simple benchmark:
 
[^Main.java]

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Prates updated PDFBOX-5823:

Attachment: Main.java

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847855#comment-17847855
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

Sure, I mean, contains() is slower for big strings, but not for small ones. My 
suggestion is to use a set, in order to avoid memory allocation and resolve in 
O ( 1 ) time.

 
`var SPACES_SET = Set.of(" ", "\t", "\n", "\r", "\f", "\\x0B");`
 
Attached I've provided a simple benchmark:
 
[^Main.java]

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
>     Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Main.java, Screenshot 2024-05-19 at 22.39.10.png, 
> Screenshot 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-05-20 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847844#comment-17847844
 ] 

Tilman Hausherr commented on PDFBOX-5823:
-

Isn't your solution slower? It would have to go through the whole string 
several times. Re memory, isn't this cleaned in garbage collection if new 
memory is needed?

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Priority: Minor
> Attachments: Screenshot 2024-05-19 at 22.39.10.png, Screenshot 
> 2024-05-19 at 22.40.17.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-05-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847842#comment-17847842
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1917838 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1917838 ]

PDFBOX-5660: update verapdf

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-05-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847841#comment-17847841
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1917837 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1917837 ]

PDFBOX-5660: update verapdf

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



<    1   2   3   4   5   6   7   8   9   10   >