[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5849:

Attachment: screenshot-1.png

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
> Attachments: screenshot-1.png
>
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864597#comment-17864597
 ] 

Tilman Hausherr commented on PDFBOX-5849:
-

A few weeks... we want to release 2.0 first. 

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864596#comment-17864596
 ] 

Leonard Wicke edited comment on PDFBOX-5849 at 7/10/24 10:47 AM:
-

[~tilman],
thank you!
I will try 3.0.3.
Is it known when this version will be published?
The workaround using PDTrueTypeFont works for most of what we need, so we are 
going to look into that as well.


was (Author: JIRAUSER306146):
Thank you!
I will try 3.0.3.
Is it known when this version will be published?
The workaround using PDTrueTypeFont works for most of what we need, so we are 
going to look into that as well.

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864596#comment-17864596
 ] 

Leonard Wicke commented on PDFBOX-5849:
---

Thank you!
I will try 3.0.3.
Is it known when this version will be published?
The workaround using PDTrueTypeFont works for most of what we need, so we are 
going to look into that as well.

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code that causes the exception*

With Apache PDFBox 3 a new functionality during showTextInternal in 
PDPageContentStream was added = encodeForGsub
This causes the glyphs of the character to be modified - to a glyph that does 
not exist.

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}
*Questions*

Is this a bug in Apache PDFBox / FontBox ?
It appears that the code in question is only executed for Fonts of class 
PDType0Font - is it possible to load a font using another class to avoid this 
bug ?

  was:
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encode

[jira] [Commented] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864582#comment-17864582
 ] 

Tilman Hausherr commented on PDFBOX-5849:
-

You could use this:
{code:java}
PDFont bold = PDTrueTypeFont.load(document, boldT, WinAnsiEncoding.INSTANCE);
{code}
However then you can use only the ansi encoding characters and the embedding 
won't subset.

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected - so its likely that all path-releases of 
> major-release 3 are affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864581#comment-17864581
 ] 

Tilman Hausherr commented on PDFBOX-5849:
-

I can reproduce it with 3.0.2 but not with 3.0.3-SNAPSHOT so it has been fixed. 
I suspect it has been one of the bugs in GSUB processing, e.g. PDFBOX-5810. 
Please try a snapshot build:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Affected Versions*
> PDFBox 2.0.30 is not affected - so its likely that also no other version of 
> major-release 2 is affected.
> PDFBox 3.0.2 is affected - so its likely that all path-releases of 
> major-release 3 are affected.
> It appears to us that this is a bug that is new with major-release 3.
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code that causes the exception*
> With Apache PDFBox 3 a new functionality during showTextInternal in 
> PDPageContentStream was added = encodeForGsub
> This causes the glyphs of the character to be modified - to a glyph that does 
> not exist.
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}
> *Questions*
> Is this a bug in Apache PDFBox / FontBox ?
> It appears that the code in question is only executed for Fonts of class 
> PDType0Font - is it possible to load a font using another class to avoid this 
> bug ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code that causes the exception*

With Apache PDFBox 3 a new functionality during showTextInternal in 
PDPageContentStream was added = encodeForGsub
This causes the glyphs of the character to be modified - to a glyph that does 
not exist.

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}
*Questions*

Is this a bug in Apache PDFBox / FontBox ?
It appears that the code in question is only executed for Fonts of class 
PDType0Font - is it possible to load a font using another class to avoid this 
bug ?

  was:
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correc

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code that causes the exception*

With Apache PDFBox 3 a new functionality during showTextInternal in 
PDPageContentStream was added = encodeForGsub
This causes the glyphs of the character to be modified - to a glyph that does 
not exist.

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when 

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code that causes the exception*

With Apache PDFBox 3 a new functionality during showText in PDPageContentStream 
was added = encodeForGsub
This causes the glyphs of the character to be modified - to a glyph that does 
not exist.

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving t

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Affected Versions*

PDFBox 2.0.30 is not affected - so its likely that also no other version of 
major-release 2 is affected.
PDFBox 3.0.2 is affected - so its likely that all path-releases of 
major-release 3 are affected.
It appears to us that this is a bug that is new with major-release 3.

*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
Its likely that other other versions of PDFBox 3 are also affected. We did not 
have this issue before with Apache PDFBox 2.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code to redproduce*
yo

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Labels: fontbox pdfbox  (was: )

> ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font
> -
>
> Key: PDFBOX-5849
> URL: https://issues.apache.org/jira/browse/PDFBOX-5849
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox, Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: Leonard Wicke
>Priority: Major
>  Labels: fontbox, pdfbox
>
> *Description*
> We are using Apache PDFBox 3.0.2 in our software and have the following issue.
> Its likely that other other versions of PDFBox 3 are also affected. We did 
> not have this issue before with Apache PDFBox 2.
> We want to write a String using the font FreeSansBold.
> The font is loaded via PDType0Font#load from a TTF-file.
> If we load the font with embedSubset=true than the following exception occurs:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
> 2912
> at 
> org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
> at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
> at 
> org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
> at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
> at 
> org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
> at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
> The reason is the question-mark-character "?". The character "!" also causes 
> an exception.
> Letters like a-zA-Z dont.
> This character is first correctly identified as Glyph-ID 34 but then in 
> PDAbstractContentStream#encodeForGsub converted to 2914 by 
> GsubWorkerForDevanagari.
> This glyph does not exist for this font and causes the exception later in the 
> code when saving the document when subsetting the fonts.
> The exception does not occur when writing the text in the PDPageContentStream.
> If we load the font with embetSubst=false then no exception occurs but the 
> character is not visible/skipped in the pdf.
> I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
> (https://issues.apache.org/jira/browse/PDFBOX-4946).
> *Code to redproduce*
> you need the font FreeSansBold or another font that causes this problem
> {code:java}
> PDDocument document = new PDDocument();
> File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
> TrueTypeFont boldT = new TTFParser().parse(new 
> RandomAccessReadBufferedFile(boldF));
> PDFont bold = PDType0Font.load(document, boldT, true);
> PDPage page = new PDPage(PDRectangle.A4);
> PDPageContentStream contentStream = new PDPageContentStream(document, page, 
> PDPageContentStream.AppendMode.APPEND, true, true);
> contentStream.setFont(bold, 11);
> contentStream.beginText();
> contentStream.newLineAtOffset(50, 50);
> contentStream.showText("?");
> contentStream.endText();
> contentStream.close();
> document.addPage(page);
> document.save(new File("Test.pdf"));{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
*Description*
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
Its likely that other other versions of PDFBox 3 are also affected. We did not 
have this issue before with Apache PDFBox 2.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font and causes the exception later in the 
code when saving the document when subsetting the fonts.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

*Code to redproduce*
you need the font FreeSansBold or another font that causes this problem
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = ne

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:

   
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:

    java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for 
length 2912
    at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
    at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
    at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
    at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
    at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944)

The reason is the question-mark-character (?). The character ! also causes an 
exception.
Letters like a-zA-Z dont.
This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):

    PDDocument document = new PDDocument();
    File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
    TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
    PDFont bold = PDType0Font.load(document, boldT, true);
    PDPage page = new PDPage(PDRectangle.A4);
    PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
    contentStream.setFont(bold, 11);
    contentStream.be

[jira] [Updated] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonard Wicke updated PDFBOX-5849:
--
Description: 
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50, 50);
contentStream.showText("?");
contentStream.endText();
contentStream.close();
document.addPage(page);
document.save(new File("Test.pdf"));{code}

  was:
We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:

   
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length 
2912
at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code}
The reason is the question-mark-character "?". The character "!" also causes an 
exception.
Letters like a-zA-Z dont.

This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):
{code:java}
PDDocument document = new PDDocument();
File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
PDFont bold = PDType0Font.load(document, boldT, true);
PDPage page = new PDPage(PDRectangle.A4);
PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
contentStream.setFont(bold, 11);
contentStream.beginText();
contentStream.newLineAtOffset(50

[jira] [Created] (PDFBOX-5849) ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font

2024-07-10 Thread Leonard Wicke (Jira)
Leonard Wicke created PDFBOX-5849:
-

 Summary: ArrayIndexOutOfBoundsException in Apache PDFBox 3 in 
connection with Font
 Key: PDFBOX-5849
 URL: https://issues.apache.org/jira/browse/PDFBOX-5849
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox, Rendering
Affects Versions: 3.0.2 PDFBox
Reporter: Leonard Wicke


We are using Apache PDFBox 3.0.2 in our software and have the following issue.
We want to write a String using the font FreeSansBold.
The font is loaded via PDType0Font#load from a TTF-file.

If we load the font with embedSubset=true than the following exception occurs:

    java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for 
length 2912
    at 
org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500)
    at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147)
    at 
org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336)
    at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304)
    at 
org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944)

The reason is the question-mark-character (?). The character ! also causes an 
exception.
Letters like a-zA-Z dont.
This character is first correctly identified as Glyph-ID 34 but then in 
PDAbstractContentStream#encodeForGsub converted to 2914 by 
GsubWorkerForDevanagari.
This glyph does not exist for this font.
This causes an exception when subsetting the font.
The exception does not occur when writing the text in the PDPageContentStream.

If we load the font with embetSubst=false then no exception occurs but the 
character is not visible/skipped in the pdf.
I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions 
(https://issues.apache.org/jira/browse/PDFBOX-4946).

Code to redproduce (you need the font FreeSansBold):

    PDDocument document = new PDDocument();
    File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf");
    TrueTypeFont boldT = new TTFParser().parse(new 
RandomAccessReadBufferedFile(boldF));
    PDFont bold = PDType0Font.load(document, boldT, true);
    PDPage page = new PDPage(PDRectangle.A4);
    PDPageContentStream contentStream = new PDPageContentStream(document, page, 
PDPageContentStream.AppendMode.APPEND, true, true);
    contentStream.setFont(bold, 11);
    contentStream.beginText();
    contentStream.newLineAtOffset(50, 50);
    contentStream.showText("?");
    contentStream.endText();
    contentStream.close();
    document.addPage(page);
    document.save(new File("Test.pdf"));



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864013#comment-17864013
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919053 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919053 ]

PDFBOX-5789: move phase of ant task to deploy

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863881#comment-17863881
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919041 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919041 ]

PDFBOX-5789: adjust path for release build

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5848.

Resolution: Fixed

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5789) Remove release subproject

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863873#comment-17863873
 ] 

ASF subversion and git services commented on PDFBOX-5789:
-

Commit 1919032 from le...@apache.org in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919032 ]

PDFBOX-5789: remove release subproject

> Remove release subproject
> -
>
> Key: PDFBOX-5789
> URL: https://issues.apache.org/jira/browse/PDFBOX-5789
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.30, 3.0.2 PDFBox, 4.0.0
>Reporter: Andreas Lehmkühler
>Assignee: Andreas Lehmkühler
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> PDFBOX-5699 introduce the new subproject "release" in order to fix some issue 
> with the SCM-URL. 
> In the hindsight it turns out to be an issue. The release project doesn't 
> include any source code and therefore is excluded from the source zip. But as 
> it is still a part of the project itself, it leads to a broken build if 
> someone uses the zip to do so.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863760#comment-17863760
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 9fc2565315df666b0f2da18a7dafbbf959806836 in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=9fc2565 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863730#comment-17863730
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919013 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1919013 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863729#comment-17863729
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919012 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1919012 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863728#comment-17863728
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1919011 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1919011 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863571#comment-17863571
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918990 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918990 ]

PDFBOX-5660: use convenience methods

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863570#comment-17863570
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918989 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918989 ]

PDFBOX-5660: use convenience method

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863569#comment-17863569
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918988 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918988 ]

PDFBOX-5660: use convenience method

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-07-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863477#comment-17863477
 ] 

Andreas Lehmkühler commented on PDFBOX-5838:


Sorry for the late answer. I had another look at the results of the regression 
tests. There were more files which got worse, BUT at least the files I'm able 
to read, were already bad in the first place. The changes didn't really improve 
the extraction results but simply replace one bad result with another.

Saying that, I stick to my intention to keep the current implementation and 
concur with [~tilman] to close this ticket.

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5838) Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31

2024-07-06 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5838.
---
Resolution: Won't Do

> Text extraction garbled in this file, was OK in 3.0.2 / 2.0.31
> --
>
> Key: PDFBOX-5838
> URL: https://issues.apache.org/jira/browse/PDFBOX-5838
> Project: PDFBox
>  Issue Type: Bug
>  Components: Text extraction
>Affects Versions: 2.0.32, 3.0.3 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>  Labels: regression
> Attachments: OFLSV3YFD3TDOU4YZTL2QY745W53W3DW.pdf, 
> PDFBOX-5838-0024320-reduced.pdf
>
>
> discovered in 2.0.32 regression tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863224#comment-17863224
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918932 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918932 ]

PDFBOX-5660: update maven enforcer configuration

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863132#comment-17863132
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918925 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918925 ]

PDFBOX-5660: improve javadoc

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863130#comment-17863130
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918923 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918923 ]

PDFBOX-5660: improve javadoc

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-05 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863131#comment-17863131
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918924 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918924 ]

PDFBOX-5660: improve javadoc

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863115#comment-17863115
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918921 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918921 ]

PDFBOX-5660: update animal sniffer plugin, commons logging

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863059#comment-17863059
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

I forgot to mention: our snapshots are not available on maven central.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862998#comment-17862998
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

Just added  [^706213.pdf] if we ever want to add a test or improve this. 
Official US document thus no copyright.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Attachment: 706213.pdf

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: 706213.pdf, cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984
 ] 

Tilman Hausherr edited comment on PDFBOX-5848 at 7/4/24 9:42 AM:
-

If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository and use that one instead of the class from the jar.


was (Author: tilman):
If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862984#comment-17862984
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

If you don't need the annotations (especially link annotations) then it's a 
solution. Alternatively copy the current source code of the splitter class from 
the repository.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Joan Fisbein (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862982#comment-17862982
 ] 

Joan Fisbein commented on PDFBOX-5848:
--

Hi [~tilman], thank you for your support.

Due to some tech limitations, I can not load snapshots from repos different 
than mavencentral right now.

I will be able to test when the snapshot is available there, sorry.

Meanwhile, you gave me an excellent idea, and I'm removing annotations before 
splitting the PDF.

Thanks!!

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Affects Version/s: 2.0.31

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Component/s: Utilities

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>  Components: Utilities
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
>     throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Fix Version/s: 2.0.32
   3.0.3 PDFBox
   4.0.0

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862971#comment-17862971
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

[~jfisbein-clarity] Please try with the new snapshot
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/
it's likely that this fixes your problem as well, because there is less to save 
now.

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop after splitting and saving PDF / giant result files

2024-07-04 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Summary: Infinite loop after splitting and saving PDF / giant result files  
(was: Infinite loop processing PDF)

> Infinite loop after splitting and saving PDF / giant result files
> -
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862968#comment-17862968
 ] 

ASF subversion and git services commented on PDFBOX-5848:
-

Commit 1918905 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918905 ]

PDFBOX-5848: remove /Parent entry for widgets because it can lead to orphan 
pages

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862967#comment-17862967
 ] 

ASF subversion and git services commented on PDFBOX-5848:
-

Commit 1918904 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918904 ]

PDFBOX-5848: remove /Parent entry for widgets because it can lead to orphan 
pages

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862966#comment-17862966
 ] 

ASF subversion and git services commented on PDFBOX-5848:
-

Commit 1918903 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918903 ]

PDFBOX-5848: remove /Parent entry for widgets because it can lead to orphan 
pages

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862959#comment-17862959
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918900 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918900 ]

PDFBOX-5660: remove unneeded cast

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862960#comment-17862960
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918901 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918901 ]

PDFBOX-5660: remove unneeded cast

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862961#comment-17862961
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918902 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918902 ]

PDFBOX-5660: remove unneeded cast

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862920#comment-17862920
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918897 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918897 ]

PDFBOX-5660: fix compiler warning

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862919#comment-17862919
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918896 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918896 ]

PDFBOX-5660: fix compiler warning

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862916#comment-17862916
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

It finished with 3.0.2 (while I slept) and the snapshot too (with a dirty fix 
for the /Parent problem). I also tried with "-startPage 1 -endPage 442" because 
I'm not sure about the default settings of the splitter class and I never tried 
her code.

I'll do a less dirty fix for the /Parent problem in the next few days.

[~jfisbein-clarity] try setting a higher stack site with "-Xss". The snapshot 
version is at
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.3-SNAPSHOT/


> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862878#comment-17862878
 ] 

Maruan Sahyoun commented on PDFBOX-5848:


It was also slow for me (approx 3 min) but was only looking at the infinite 
loop question.

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867
 ] 

Tilman Hausherr edited comment on PDFBOX-5848 at 7/3/24 6:15 PM:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages. Opening and saving it with Adobe Reader 
brings a much smaller file, where the /Parent entry value is set to null.
 !screenshot-1.png! 


was (Author: tilman):
I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5848:

Attachment: screenshot-1.png

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862867#comment-17862867
 ] 

Tilman Hausherr commented on PDFBOX-5848:
-

I'm testing with 3.0.2 and it's working very slowly... I'm at page 170. However 
there's a different problem, lots of orphan pages. The reason is that some 
annotations have a /Parent entry which has a /Kids entry whose children are 
annotations on *different* pages.
 !screenshot-1.png! 

> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf, 
> screenshot-1.png
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Maruan Sahyoun (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862853#comment-17862853
 ] 

Maruan Sahyoun commented on PDFBOX-5848:


tried with 3.0.3-SNAHSHOT and works for me using the command line split command:

{code}
java -jar pdfbox-app-3.0.3-SNAPSHOT.jar split -i 
cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf
{code}

Can you try the same with the 3.0.2 version and of that doesn't work for you 
with 3.0.3-SNAPSHOT?


> Infinite loop processing PDF
> 
>
> Key: PDFBOX-5848
> URL: https://issues.apache.org/jira/browse/PDFBOX-5848
> Project: PDFBox
>  Issue Type: Bug
>Affects Versions: 3.0.2 PDFBox
>Reporter: Joan Fisbein
>Priority: Major
> Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf
>
>
> I use PDFBox to split hundreds of PDFs per day, generally, everything works 
> flawlessly but I just received a PDF that generates an infinite loop when I 
> try to split it.
>  
> I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
> versions):
> {code:java}
> private static void splitPdf(File fileToSplit) {
>   try (PDDocument document = Loader.loadPDF(fileToSplit)) {
> int documentPages = document.getNumberOfPages();
> Splitter splitter = new Splitter();
> List Pages = splitter.split(document);
> Iterator iterator = Pages.listIterator();
> while (iterator.hasNext()) {
>   PDDocument pd = iterator.next();
>   pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
>   pd.close();
> }
>   } catch (IOException e) {
> throw new RuntimeException(e);
>   }
> } {code}
> The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862846#comment-17862846
 ] 

Joan Fisbein commented on PDFBOX-5848:
--

This is the stacktrace from my production application trying to process this 
PDF file:

 
{code:java}
   java.lang.Thread.State: RUNNABLE
at java.base@21.0.3/java.util.ArrayList.indexOfRange(ArrayList.java:299)
at java.base@21.0.3/java.util.ArrayList.indexOf(ArrayList.java:286)
at java.base@21.0.3/java.util.ArrayList.contains(ArrayList.java:275)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:199)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188

[jira] [Comment Edited] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862846#comment-17862846
 ] 

Joan Fisbein edited comment on PDFBOX-5848 at 7/3/24 4:39 PM:
--

This is the stacktrace from my production application trying to split the same 
PDF file:

 
{code:java}
   java.lang.Thread.State: RUNNABLE
at java.base@21.0.3/java.util.ArrayList.indexOfRange(ArrayList.java:299)
at java.base@21.0.3/java.util.ArrayList.indexOf(ArrayList.java:286)
at java.base@21.0.3/java.util.ArrayList.contains(ArrayList.java:275)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:199)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:184)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:202)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addStructure(COSWriterCompressionPool.java:188)
at 
org.apache.pdfbox.pdfwriter.compress.COSWriterCompressionPool.addElements(COSWriterCompressionPool.java:219

[jira] [Created] (PDFBOX-5848) Infinite loop processing PDF

2024-07-03 Thread Joan Fisbein (Jira)
Joan Fisbein created PDFBOX-5848:


 Summary: Infinite loop processing PDF
 Key: PDFBOX-5848
 URL: https://issues.apache.org/jira/browse/PDFBOX-5848
 Project: PDFBox
  Issue Type: Bug
Affects Versions: 3.0.2 PDFBox
Reporter: Joan Fisbein
 Attachments: cbc0018b-5659-4ae3-9887-0e0a2d9a62a7.pdf

I use PDFBox to split hundreds of PDFs per day, generally, everything works 
flawlessly but I just received a PDF that generates an infinite loop when I try 
to split it.

 

I used this Java code to reproduce it using PDFBox 3.0.2 (haven't tried other 
versions):
{code:java}
private static void splitPdf(File fileToSplit) {
  try (PDDocument document = Loader.loadPDF(fileToSplit)) {
int documentPages = document.getNumberOfPages();
Splitter splitter = new Splitter();
List Pages = splitter.split(document);
Iterator iterator = Pages.listIterator();
while (iterator.hasNext()) {
  PDDocument pd = iterator.next();
  pd.save(fileToSplit.getName() + "-" + Pages.indexOf(pd) + ".pdf");
  pd.close();
}
  } catch (IOException e) {
throw new RuntimeException(e);
  }
} {code}
The PDF file is attached to the issue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862676#comment-17862676
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit aa0bed934f3c370969aee0126f7836a91fd5e3eb in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=aa0bed9 ]

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin

PDFBOX-5660: update owasp plugin


> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862652#comment-17862652
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918866 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918866 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862651#comment-17862651
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918865 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918865 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862650#comment-17862650
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918864 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918864 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862649#comment-17862649
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit b6ce7d70cc0a5c49246bae9c17db146d3c37ef5c in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=b6ce7d7 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862646#comment-17862646
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 5eb794c23ddad2c0d337635cffa1abcd243c52d0 in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=5eb794c ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862445#comment-17862445
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918844 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918844 ]

PDFBOX-5660: fix logging exception parameter type

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862437#comment-17862437
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918837 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918837 ]

PDFBOX-5660: fix logging exception parameter type

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862433#comment-17862433
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918836 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918836 ]

PDFBOX-5660: fix logging exception parameter type

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862424#comment-17862424
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918830 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918830 ]

PDFBOX-5660: fix logging exception parameter type

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17862422#comment-17862422
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918829 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918829 ]

PDFBOX-5660: fix logging exception parameter type

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861207#comment-17861207
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 7abf9d837f551b7dc133c1c57e5277b573a75dde in pdfbox-jbig2's branch 
refs/heads/master from Tilman Hausherr
[ https://gitbox.apache.org/repos/asf?p=pdfbox-jbig2.git;h=7abf9d8 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Assigned] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr reassigned PDFBOX-5847:
---

Assignee: Tilman Hausherr

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5847:

Fix Version/s: 3.0.3 PDFBox
   4.0.0

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861177#comment-17861177
 ] 

ASF subversion and git services commented on PDFBOX-5847:
-

Commit 1918786 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918786 ]

PDFBOX-5847: fix log message

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861174#comment-17861174
 ] 

ASF subversion and git services commented on PDFBOX-5847:
-

Commit 1918783 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918783 ]

PDFBOX-5847: fix log message

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861175#comment-17861175
 ] 

ASF subversion and git services commented on PDFBOX-5847:
-

Commit 1918784 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918784 ]

PDFBOX-5847: fix log message

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861170#comment-17861170
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918781 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1918781 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861169#comment-17861169
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918780 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918780 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861150#comment-17861150
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918779 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918779 ]

PDFBOX-5660: update owasp plugin

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1786#comment-1786
 ] 

ASF subversion and git services commented on PDFBOX-5847:
-

Commit 1918774 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918774 ]

PDFBOX-5847: Improve performance of FileSystemFontProvider.scanFonts() by 
introducing an "only headers" mode for the font parsers where each table reads 
as little information as possible, by Mykola Bohdiuk

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861110#comment-17861110
 ] 

ASF subversion and git services commented on PDFBOX-5847:
-

Commit 1918773 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918773 ]

PDFBOX-5847: Improve performance of FileSystemFontProvider.scanFonts() by 
introducing an "only headers" mode for the font parsers where each table reads 
as little information as possible, by Mykola Bohdiuk

> Improve performance of FileSystemFontProvider.scanFonts()
> -
>
> Key: PDFBOX-5847
> URL: https://issues.apache.org/jira/browse/PDFBOX-5847
> Project: PDFBox
>  Issue Type: Improvement
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Priority: Major
>
> PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
> parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-07-01 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17861103#comment-17861103
 ] 

Tilman Hausherr edited comment on PDFBOX-5225 at 7/1/24 9:28 AM:
-

No I'm not / yes please. I just clarified what it is about.


was (Author: tilman):
No I'm not. I just clarified what it is about.

> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5847) Improve performance of FileSystemFontProvider.scanFonts()

2024-07-01 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5847:
---

 Summary: Improve performance of FileSystemFontProvider.scanFonts()
 Key: PDFBOX-5847
 URL: https://issues.apache.org/jira/browse/PDFBOX-5847
 Project: PDFBox
  Issue Type: Improvement
Affects Versions: 3.0.2 PDFBox, 2.0.31
Reporter: Tilman Hausherr


PR by Mykola Bohdiuk which introduces an "only headers" mode for the font 
parsers where each table reads as little information as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5383) JAVA program Crashes

2024-06-29 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5383.
---
Resolution: Not A Bug

Closing because this isn't "our" bug, it's in JDK8.

> JAVA program Crashes
> 
>
> Key: PDFBOX-5383
> URL: https://issues.apache.org/jira/browse/PDFBOX-5383
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.24, 2.0.25, 3.0.0 PDFBox
>Reporter: krishna prasad
>Priority: Major
>  Labels: crash, jdk8
> Attachments: crash.pdf
>
>
> I am trying to convert the PDF into images by using render. It hangs up the 
> program.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Closed] (PDFBOX-5289) java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: 13377272)

2024-06-29 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-5289.
---
Resolution: Won't Fix

Won't fix in 2.0, but works in 3.0 as long as you don't try to access the 
docinfo.

> java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at 
> offset 13377272 (start offset: 13377272)
> -
>
> Key: PDFBOX-5289
> URL: https://issues.apache.org/jira/browse/PDFBOX-5289
> Project: PDFBox
>  Issue Type: Bug
>  Components: Parsing
>Affects Versions: 2.0.24
>Reporter: Stephen
>Priority: Major
> Attachments: Diplomacy by Henry Kissinger (1).pdf
>
>
> {code:java}
> java.io.IOException: Unknown dir object c='>' cInt=62 peek='>' peekInt=62 at 
> offset 13377272 (start offset: 13377272)java.io.IOException: Unknown dir 
> object c='>' cInt=62 peek='>' peekInt=62 at offset 13377272 (start offset: 
> 13377272) at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:913) at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryValue(BaseParser.java:154)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionaryNameValuePair(BaseParser.java:288)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseCOSDictionary(BaseParser.java:218)
>  at 
> org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:857) at 
> org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:907) at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:876)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:796)
>  at 
> org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2858)
>  at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:175) at 
> org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1228) at 
> org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1128)
> {code}
> Please find the problematic PDF attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-06-28 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5225:

Attachment: screenshot-1.png

> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5225) Flattening removes all annotations when widget annotation has no page

2024-06-28 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5225:

Description: 
{code}
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
List list = new ArrayList<>();
list.add(acroForm.getField("VN_NAME"));
acroForm.flatten(list, true); 
{code}
The code from buildPagesWidgetsMap that is run when there are widgets with 
missing page references does not consider the field list. So all widgets end up 
in the map instead of only those we care about.

 !screenshot-1.png! 

  was:
{code}
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
List list = new ArrayList<>();
list.add(acroForm.getField("VN_NAME"));
acroForm.flatten(list, true); 
{code}
The code from buildPagesWidgetsMap that is run when there are widgets with 
missing page references does not consider the field list. So all widgets end up 
in the map instead of only those we care about.


> Flattening removes all annotations when widget annotation has no page
> -
>
> Key: PDFBOX-5225
> URL: https://issues.apache.org/jira/browse/PDFBOX-5225
> Project: PDFBox
>  Issue Type: Bug
>  Components: AcroForm
>Affects Versions: 2.0.24
>Reporter: Tilman Hausherr
>Priority: Major
> Attachments: SourceFailure.pdf, screenshot-1.png
>
>
> {code}
> PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
> List list = new ArrayList<>();
> list.add(acroForm.getField("VN_NAME"));
> acroForm.flatten(list, true); 
> {code}
> The code from buildPagesWidgetsMap that is run when there are widgets with 
> missing page references does not consider the field list. So all widgets end 
> up in the map instead of only those we care about.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5846) A PDF with 5.3 million xref data, performance comparison between pdfbox3 and itextpdf.hope to optimize the memory usage and loading process time.

2024-06-28 Thread liu (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liu updated PDFBOX-5846:

Summary: A PDF with 5.3 million xref data, performance comparison between 
pdfbox3 and itextpdf.hope to optimize the memory usage and loading process 
time.  (was: A PDF with 5.3 million xref data, performance comparison between 
pdfbox3 and itextpdf.)

> A PDF with 5.3 million xref data, performance comparison between pdfbox3 and 
> itextpdf.hope to optimize the memory usage and loading process time.
> -
>
> Key: PDFBOX-5846
> URL: https://issues.apache.org/jira/browse/PDFBOX-5846
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Rendering
>Affects Versions: 3.0.2 PDFBox
>Reporter: liu
>Priority: Major
> Attachments: 66.7z, image-2024-06-28-15-49-33-885.png, 
> image-2024-06-28-15-50-36-240.png, image-2024-06-28-15-57-49-634.png, 
> image-2024-06-28-16-00-39-424.png
>
>
> There is a pdf in this compressed file.
> [^66.zip]
>  
> Test comparison code
>  
> {code:java}
> //代码占位符
> package net.qiyuesuo.common.pdf;
> import com.itextpdf.text.pdf.PdfReader;
> import org.apache.pdfbox.Loader;
> import org.apache.pdfbox.io.IOUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import java.io.File;
> /**
>  * @author :  
>  * @description :
>  * @date :
>  */
> public class Test6 {
>private static void pdfbox3() throws Throwable{
>   long l = System.currentTimeMillis();
>   File file = new File("C:\\Users\\LYCIT\\Downloads\\66.pdf");
>   PDDocument pdf = Loader.loadPDF(file, 
> IOUtils.createTempFileOnlyStreamCache());
>   System.out.println("loadPDF time:"+ (System.currentTimeMillis() - l));
>   Thread.sleep(3600);
>}
>public static void itextpdf() throws Throwable {
>   long l = System.currentTimeMillis();
>   String file = "C:\\Users\\LYCIT\\Downloads\\66.pdf";
>   final PdfReader pdfReader = new PdfReader(file, null, true);
>   System.out.println("loadPDF time:"+ (System.currentTimeMillis() - l));
>   Thread.sleep(3600);
>}
>public static void main(String[] args) throws Throwable {
>   pdfbox3();
> //itextpdf();
>}
> }
>  {code}
> Load time:
> pdfbox3:10233 ms.
>  
> itextpdf:925 ms.
> Memory usage:
> pdfbox3:790M.
> !image-2024-06-28-15-49-33-885.png|width=312,height=151!
> itextpdf:106M.
> !image-2024-06-28-15-50-36-240.png|width=307,height=166!
>  
> Detailed memory usage:
> pdfbox3:xrefTable and keyCache.
> !image-2024-06-28-15-57-49-634.png|width=307,height=200!
> itextpdf:xref and xrefObj.
> !image-2024-06-28-16-00-39-424.png|width=311,height=203!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5846) A PDF with 5.3 million xref data, performance comparison between pdfbox3 and itextpdf.

2024-06-28 Thread liu (Jira)
liu created PDFBOX-5846:
---

 Summary: A PDF with 5.3 million xref data, performance comparison 
between pdfbox3 and itextpdf.
 Key: PDFBOX-5846
 URL: https://issues.apache.org/jira/browse/PDFBOX-5846
 Project: PDFBox
  Issue Type: Improvement
  Components: Rendering
Affects Versions: 3.0.2 PDFBox
Reporter: liu
 Attachments: 66.7z, image-2024-06-28-15-49-33-885.png, 
image-2024-06-28-15-50-36-240.png, image-2024-06-28-15-57-49-634.png, 
image-2024-06-28-16-00-39-424.png

There is a pdf in this compressed file.

[^66.zip]

 

Test comparison code

 
{code:java}
//代码占位符
package net.qiyuesuo.common.pdf;

import com.itextpdf.text.pdf.PdfReader;
import org.apache.pdfbox.Loader;
import org.apache.pdfbox.io.IOUtils;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;

/**
 * @author :  
 * @description :
 * @date :
 */
public class Test6 {

   private static void pdfbox3() throws Throwable{
  long l = System.currentTimeMillis();
  File file = new File("C:\\Users\\LYCIT\\Downloads\\66.pdf");
  PDDocument pdf = Loader.loadPDF(file, 
IOUtils.createTempFileOnlyStreamCache());
  System.out.println("loadPDF time:"+ (System.currentTimeMillis() - l));
  Thread.sleep(3600);
   }

   public static void itextpdf() throws Throwable {
  long l = System.currentTimeMillis();
  String file = "C:\\Users\\LYCIT\\Downloads\\66.pdf";
  final PdfReader pdfReader = new PdfReader(file, null, true);
  System.out.println("loadPDF time:"+ (System.currentTimeMillis() - l));
  Thread.sleep(3600);
   }

   public static void main(String[] args) throws Throwable {
  pdfbox3();
//itextpdf();
   }
}
 {code}
Load time:
pdfbox3:10233 ms.

 

itextpdf:925 ms.

Memory usage:


pdfbox3:790M.

!image-2024-06-28-15-49-33-885.png|width=312,height=151!
itextpdf:106M.
!image-2024-06-28-15-50-36-240.png|width=307,height=166!


 

Detailed memory usage:

pdfbox3:xrefTable and keyCache.
!image-2024-06-28-15-57-49-634.png|width=307,height=200!

itextpdf:xref and xrefObj.

!image-2024-06-28-16-00-39-424.png|width=311,height=203!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860693#comment-17860693
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918722 from Tilman Hausherr in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1918722 ]

PDFBOX-5660: update junit

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5660) Improve code quality (5)

2024-06-27 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860694#comment-17860694
 ] 

ASF subversion and git services commented on PDFBOX-5660:
-

Commit 1918723 from Tilman Hausherr in branch 'pdfbox/branches/3.0'
[ https://svn.apache.org/r1918723 ]

PDFBOX-5660: update junit

> Improve code quality (5)
> 
>
> Key: PDFBOX-5660
> URL: https://issues.apache.org/jira/browse/PDFBOX-5660
> Project: PDFBox
>  Issue Type: Improvement
>Reporter: Tilman Hausherr
>Priority: Minor
> Attachments: AnnotationSample.Standard.pdf, 
> DRY_refactoring_Typ2CharStringParser.patch, 
> Removed_the_readFully_method_in_the_PfbParser_class_and_replaced__with_calling_readAllByte.patch,
>  
> Simplify_list_and_map_operations,_use_known_size_when_creating_StringBuilder.patch,
>  Simplify_string_conversion_in_PDFHighlighter.patch, 
> Update_string_handling_and_regex_in_several_classes.patch, 
> avoid_multiple_unboxing.patch, code_cleanup.patch, 
> do_not_create_temporary_File_instance.patch, 
> extract_common_code,_move_toUpperCase()_out_of_loop.patch, 
> fix_HTML_error_in_Javadoc.patch, fix_javadoc_problems.patch, 
> introduce_COSArray_of(float___)_to_make_the_code_more_concise_and_avoid_creating_and_copyi.patch,
>  introduce_StringUtil_class_for_reusable_functionality.patch, 
> introduce_constants_COSFLOAT_ZERO_and_COSFloat_ONE_to_avoid_creating_unnecessary_instances.patch,
>  make_inner_class_static.patch, refactor_isEndOfName.patch, 
> remove_code_duplication_in_Type2CharStringParser.patch, 
> remove_obsolete_class_NullOutputStream.patch, 
> remove_unnecessary_calls_to_toString()_String_valueOf().patch, 
> replace_System_getProperty()_calls.patch, screenshot-1.png, 
> simplify_hashCode()_and_equals(),_test_name_first_because_Map_equals()_is_expensive.patch,
>  simplify_stream_operations.patch, use_Map_ofEntries().patch, 
> use_Math_min()_to_make_code_more_readable.patch, use_Objects_equals().patch, 
> use_String_isEmpty()_Collection_isEmpty()_instead_of_checking_length_size.patch,
>  use_String_join().patch, use_switch_for_readability.patch, 
> use_try-with-resources_(since_Java_9_the_variable_declaration_in_the_try_is_not_necessary_.patch
>
>
> This is a longterm issue for the task to improve code quality, by using the 
> SonarQube report, hints in different IDEs, the FindBugs tool and other code 
> quality tools.
> This is a follow-up of PDFBOX-4892, which was getting too long.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5842) IllegalArgumentException: Width (26) and height (0) must be non-zero

2024-06-27 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5842.
-
Resolution: Fixed

> IllegalArgumentException: Width (26) and height (0) must be non-zero
> 
>
> Key: PDFBOX-5842
> URL: https://issues.apache.org/jira/browse/PDFBOX-5842
> Project: PDFBox
>  Issue Type: Bug
>  Components: Rendering
>Affects Versions: 2.0.31, 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Major
> Fix For: 2.0.32, 3.0.3 PDFBox, 4.0.0
>
>
> reported by Patrycja Zaremba in the users mailing list
> https://lists.apache.org/thread/xnwcyhq2c16d9xfgqwgjs70k9qb1w8tp
> {quote}When the page which I try to convert have any element which is png with
> only 1px height (28x1, 54x1 etc.) it is scaled down to 0 and I got this{quote}
> IllegalArgumentException: Width (26) and height (0) must be non-zero
> org.apache.pdfbox.rendering.PageDrawer.drawImage(PageDrawer.java:1281)
> 
> org.apache.pdfbox.contentstream.operator.graphics.DrawObject.process(DrawObject.java:74)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Commented] (PDFBOX-5823) StringUtil.PATTERN_SPACE memory optmisation

2024-06-26 Thread Jonathan Prates (Jira)


[ 
https://issues.apache.org/jira/browse/PDFBOX-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17860122#comment-17860122
 ] 

Jonathan Prates commented on PDFBOX-5823:
-

hi [~lehmi] is there any estimated date for the 3.0.3 to go live?

> StringUtil.PATTERN_SPACE memory optmisation
> ---
>
> Key: PDFBOX-5823
> URL: https://issues.apache.org/jira/browse/PDFBOX-5823
> Project: PDFBox
>  Issue Type: Improvement
>  Components: PDModel
>Affects Versions: 3.0.3 PDFBox
>Reporter: Jonathan Prates
>Assignee: Andreas Lehmkühler
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
> Attachments: Main-1.java, Main.java, Screenshot 2024-05-19 at 
> 22.39.10.png, Screenshot 2024-05-19 at 22.40.17.png, Screenshot 2024-05-21 at 
> 20.21.43.png
>
>
> PDAbstractContentStream uses StringUtil.PATTERN_SPACE regexp to evaluate if a 
> word has a space in it 
> ([https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/PDAbstractContentStream.java#L1624])
> For large documents ~800 pages and small string sequences (like a regular 
> word), it causes a memory overhead (see attached), due to the several extra 
> allocations. I've replaced the regexp for space and \t using word.contains, 
> and since it's a O ( 1 ) operation that does not require extra allocations, 
> memory used has been reduced.
> What would be the implications of replacing this block for contains()?
> Since \s is [ \t\n\x0B\f\r], I believe we have a simplified version to 
> allocate less memory.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Resolved] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr resolved PDFBOX-5845.
-
Resolution: Fixed

fixed in 1918648 (3.0) and in 1918649 (trunk)

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5845:

Fix Version/s: 3.0.3 PDFBox
   4.0.0

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 3.0.3 PDFBox, 4.0.0
>
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Updated] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)


 [ 
https://issues.apache.org/jira/browse/PDFBOX-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-5845:

Affects Version/s: 3.0.2 PDFBox

> potential memory leak in TrueTypeCollection.java
> 
>
> Key: PDFBOX-5845
> URL: https://issues.apache.org/jira/browse/PDFBOX-5845
> Project: PDFBox
>  Issue Type: Bug
>  Components: FontBox
>Affects Versions: 3.0.2 PDFBox
>Reporter: Tilman Hausherr
>Assignee: Tilman Hausherr
>Priority: Minor
>
> This is part of PR#189 (which will be done in a future ticket) and is done 
> separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Created] (PDFBOX-5845) potential memory leak in TrueTypeCollection.java

2024-06-26 Thread Tilman Hausherr (Jira)
Tilman Hausherr created PDFBOX-5845:
---

 Summary: potential memory leak in TrueTypeCollection.java
 Key: PDFBOX-5845
 URL: https://issues.apache.org/jira/browse/PDFBOX-5845
 Project: PDFBox
  Issue Type: Bug
  Components: FontBox
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr


This is part of PR#189 (which will be done in a future ticket) and is done 
separately to shorten / clarify the patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



<    1   2   3   4   5   6   7   8   9   10   >