[ https://issues.apache.org/jira/browse/PDFBOX-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045708#comment-17045708 ]
Tilman Hausherr commented on PDFBOX-4785: ----------------------------------------- Yes it is indeed related, see the /ToUnicode stream of the first font: {code:java} <1b70> <1b71> <66FF> {code} See the commend by [~mkl] in PDFBOX-4661. This is an incorrect PDF, the third token has FF but the range has two elements. > No Unicode mapping with MS-Mincho > --------------------------------- > > Key: PDFBOX-4785 > URL: https://issues.apache.org/jira/browse/PDFBOX-4785 > Project: PDFBox > Issue Type: Bug > Components: FontBox > Affects Versions: 2.0.18, 2.0.19 > Reporter: Ryosuke Fujita > Priority: Major > Attachments: E02779_convocation_notice_p14.pdf > > > ExtractText from attached pdf fails after v2.0.18 while v2.0.17 succeed. > Error message is as follows, and can't extract character "最"(CID+7025). > FEB 26, 2020 10:32:29 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode > WARNING: No Unicode mapping for CID+7025 (7025) in font NAEGKL+MS-Mincho > This maybe related to PDFBOX-4661? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org