[ https://issues.apache.org/jira/browse/PDFBOX-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864690#comment-17864690 ]
Leonard Wicke edited comment on PDFBOX-5849 at 7/10/24 2:32 PM: ---------------------------------------------------------------- [~tilman] , I have repoend the issue because Yes the ? is rendered correctly in the pdf but if i copy and paste it to an text-editor. The character is not a question-mark but again (U+0B5B). So, while the pdf looks correctly, its acutally wrong. Something is not working correctly - even in 3.0.3-SNAPSHOT. This explains the weird behavior of the PDFTextStripper. was (Author: JIRAUSER306146): *I have repoend the issue because* Yes the ? is rendered correctly in the pdf but if i copy and paste it to an text-editor. The character is not a question-mark but again (U+0B5B). So, while the pdf looks correctly, its acutally wrong. Something is not working correctly - even in 3.0.3-SNAPSHOT. > ArrayIndexOutOfBoundsException in Apache PDFBox 3 in connection with Font > ------------------------------------------------------------------------- > > Key: PDFBOX-5849 > URL: https://issues.apache.org/jira/browse/PDFBOX-5849 > Project: PDFBox > Issue Type: Bug > Components: FontBox, Rendering > Affects Versions: 3.0.2 PDFBox > Reporter: Leonard Wicke > Assignee: Tilman Hausherr > Priority: Major > Labels: fontbox, pdfbox > Attachments: screenshot-1.png > > > *Affected Versions* > PDFBox 2.0.30 is not affected - so its likely that also no other version of > major-release 2 is affected. > PDFBox 3.0.2 is affected. > It appears to us that this is a bug that is new with major-release 3. > *Description* > We are using Apache PDFBox 3.0.2 in our software and have the following issue. > We want to write a String using the font FreeSansBold. > The font is loaded via PDType0Font#load from a TTF-file. > If we load the font with embedSubset=true than the following exception occurs: > {code:java} > java.lang.ArrayIndexOutOfBoundsException: Index 2941 out of bounds for length > 2912 > at > org.apache.fontbox.ttf.TTFSubsetter.addCompoundReferences(TTFSubsetter.java:500) > at org.apache.fontbox.ttf.TTFSubsetter.getGIDMap(TTFSubsetter.java:147) > at > org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.subset(TrueTypeEmbedder.java:336) > at org.apache.pdfbox.pdmodel.font.PDType0Font.subset(PDType0Font.java:304) > at > org.apache.pdfbox.pdmodel.PDDocument.subsetDesignatedFonts(PDDocument.java:1046) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1034) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:988) > at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:944){code} > The reason is the question-mark-character "?". The character "!" also causes > an exception. > Letters like a-zA-Z dont. > This character is first correctly identified as Glyph-ID 34 but then in > PDAbstractContentStream#encodeForGsub converted to 2914 by > GsubWorkerForDevanagari. > This glyph does not exist for this font and causes the exception later in the > code when saving the document when subsetting the fonts. > The exception does not occur when writing the text in the PDPageContentStream. > If we load the font with embetSubst=false then no exception occurs but the > character is not visible/skipped in the pdf. > I have only found old and fixed issues with ArrayIndexOutOfBoundsExceptions > (https://issues.apache.org/jira/browse/PDFBOX-4946). > *Code that causes the exception* > With Apache PDFBox 3 a new functionality during showTextInternal in > PDPageContentStream was added = encodeForGsub > This causes the glyphs of the character to be modified - to a glyph that does > not exist. > *Code to redproduce* > you need the font FreeSansBold or another font that causes this problem > {code:java} > PDDocument document = new PDDocument(); > File boldF = new File("src/test/resources/fonts", "FreeSansBold.ttf"); > TrueTypeFont boldT = new TTFParser().parse(new > RandomAccessReadBufferedFile(boldF)); > PDFont bold = PDType0Font.load(document, boldT, true); > PDPage page = new PDPage(PDRectangle.A4); > PDPageContentStream contentStream = new PDPageContentStream(document, page, > PDPageContentStream.AppendMode.APPEND, true, true); > contentStream.setFont(bold, 11); > contentStream.beginText(); > contentStream.newLineAtOffset(50, 50); > contentStream.showText("?"); > contentStream.endText(); > contentStream.close(); > document.addPage(page); > document.save(new File("Test.pdf"));{code} > *Questions* > Is this a bug in Apache PDFBox / FontBox ? > It appears that the code in question is only executed for Fonts of class > PDType0Font - is it possible to load a font using another class to avoid this > bug ? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org