[ 
https://issues.apache.org/jira/browse/PDFBOX-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045708#comment-17045708
 ] 

Tilman Hausherr commented on PDFBOX-4785:
-----------------------------------------

Yes it is indeed related, see the /ToUnicode stream of the first font:
{code:java}
<1b70> <1b71> <66FF> {code}
See the commend by [~mkl] in PDFBOX-4661. This is an incorrect PDF, the third 
token has FF but the range has two elements.

> No Unicode mapping with MS-Mincho
> ---------------------------------
>
>                 Key: PDFBOX-4785
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4785
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.18, 2.0.19
>            Reporter: Ryosuke Fujita
>            Priority: Major
>         Attachments: E02779_convocation_notice_p14.pdf
>
>
> ExtractText from attached pdf fails after v2.0.18 while v2.0.17 succeed. 
> Error message is as follows, and can't extract character "最"(CID+7025).
> FEB 26, 2020 10:32:29 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+7025 (7025) in font NAEGKL+MS-Mincho
> This maybe related to PDFBOX-4661?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to