[
https://issues.apache.org/jira/browse/PDFBOX-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-5387.
----------------------------------------
Resolution: Fixed
> ToUnicodeWriter.writeTo allows byte overflow in bfrange operator
> ----------------------------------------------------------------
>
> Key: PDFBOX-5387
> URL: https://issues.apache.org/jira/browse/PDFBOX-5387
> Project: PDFBox
> Issue Type: Bug
> Components: PDModel
> Affects Versions: 2.0.25
> Reporter: Ryan Jackson
> Assignee: Andreas Lehmkühler
> Priority: Major
> Fix For: 2.0.26, 3.0.0 PDFBox
>
>
> The {{writeTo}} method of {{ToUnicodeWriter}} allows overflow in the
> low-order byte when writing the {{(begin/end)bfrange}} operator.
> As far as I can tell it is used only with the {{PDCIDFontType2Embedder}}
> class. I believe the bug exists in both the main trunk and in the 2.x branch.
> The code in question may be found
> [here|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/ToUnicodeWriter.java#L133-L136]
> .
> The portion of the PDF specification (version 1.7) that bears upon this code
> is Section 5.9, Example 5.16.
> The existing code attempts to limit the range logic to changes less than or
> equal to 255 code points, but it fails to account for at least the following
> situation by allowing this (for example):
> [srcCode1 srcCode2 dstString]
> 03FF 0400 0036
> The overflow between srcCode1 and srcCode2 is not allowed by the
> specification and any text extraction will fail. The glyphs themselves render
> fine so it is not immediately obvious there is a problem until one tries to
> examine the text by using the Content Panel or by copy/pasting from Acrobat
> (Pro) to some other document. By contrast the following bfrange operator does
> allow the text extraction to work as intended:
> [srcCode1 srcCode2 dstString]
> 03FE 03FF 0035
> Notice that no overflow exists, and as such the requirements of the
> specification are met.
> I have put together a proposed solution
> [here|https://github.com/ryanjackson-wf/pdfbox/pull/1] in my fork of the
> PDFBox GH mirror.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]