[ https://issues.apache.org/jira/browse/PDFBOX-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-5387. ---------------------------------------- Resolution: Fixed > ToUnicodeWriter.writeTo allows byte overflow in bfrange operator > ---------------------------------------------------------------- > > Key: PDFBOX-5387 > URL: https://issues.apache.org/jira/browse/PDFBOX-5387 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 2.0.25 > Reporter: Ryan Jackson > Assignee: Andreas Lehmkühler > Priority: Major > Fix For: 2.0.26, 3.0.0 PDFBox > > > The {{writeTo}} method of {{ToUnicodeWriter}} allows overflow in the > low-order byte when writing the {{(begin/end)bfrange}} operator. > As far as I can tell it is used only with the {{PDCIDFontType2Embedder}} > class. I believe the bug exists in both the main trunk and in the 2.x branch. > The code in question may be found > [here|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/ToUnicodeWriter.java#L133-L136] > . > The portion of the PDF specification (version 1.7) that bears upon this code > is Section 5.9, Example 5.16. > The existing code attempts to limit the range logic to changes less than or > equal to 255 code points, but it fails to account for at least the following > situation by allowing this (for example): > [srcCode1 srcCode2 dstString] > 03FF 0400 0036 > The overflow between srcCode1 and srcCode2 is not allowed by the > specification and any text extraction will fail. The glyphs themselves render > fine so it is not immediately obvious there is a problem until one tries to > examine the text by using the Content Panel or by copy/pasting from Acrobat > (Pro) to some other document. By contrast the following bfrange operator does > allow the text extraction to work as intended: > [srcCode1 srcCode2 dstString] > 03FE 03FF 0035 > Notice that no overflow exists, and as such the requirements of the > specification are met. > I have put together a proposed solution > [here|https://github.com/ryanjackson-wf/pdfbox/pull/1] in my fork of the > PDFBox GH mirror. > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org