[ 
https://issues.apache.org/jira/browse/PDFBOX-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-5387.
----------------------------------------
    Resolution: Fixed

> ToUnicodeWriter.writeTo allows byte overflow in bfrange operator
> ----------------------------------------------------------------
>
>                 Key: PDFBOX-5387
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5387
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 2.0.25
>            Reporter: Ryan Jackson
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.26, 3.0.0 PDFBox
>
>
> The {{writeTo}} method of {{ToUnicodeWriter}} allows overflow in the 
> low-order byte when writing the {{(begin/end)bfrange}} operator.
> As far as I can tell it is used only with the {{PDCIDFontType2Embedder}} 
> class. I believe the bug exists in both the main trunk and in the 2.x branch. 
> The code in question may be found 
> [here|https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdmodel/font/ToUnicodeWriter.java#L133-L136]
>  .
> The portion of the PDF specification (version 1.7) that bears upon this code 
> is Section 5.9, Example 5.16.
> The existing code attempts to limit the range logic to changes less than or 
> equal to 255 code points, but it fails to account for at least the following 
> situation by allowing this (for example):
> [srcCode1 srcCode2 dstString]
> 03FF 0400 0036
> The overflow between srcCode1 and srcCode2 is not allowed by the 
> specification and any text extraction will fail. The glyphs themselves render 
> fine so it is not immediately obvious there is a problem until one tries to 
> examine the text by using the Content Panel or by copy/pasting from Acrobat 
> (Pro) to some other document. By contrast the following bfrange operator does 
> allow the text extraction to work as intended:
> [srcCode1 srcCode2 dstString]
> 03FE 03FF 0035
> Notice that no overflow exists, and as such the requirements of the 
> specification are met.
> I have put together a proposed solution 
> [here|https://github.com/ryanjackson-wf/pdfbox/pull/1] in my fork of the 
> PDFBox GH mirror.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to