[ https://issues.apache.org/jira/browse/PDFBOX-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118855#comment-13118855 ]
Kevin Clark commented on PDFBOX-941: ------------------------------------ I'm seeing this with the Tika 0.10 release which uses 1.6.0: 2011-10-01 16:15:43,516 (53344917) [Parser-thread-1] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not parse predefined CMAP file for 'Adobe-Japan1-UCS2' > extracting Japanese characters gives garbage > -------------------------------------------- > > Key: PDFBOX-941 > URL: https://issues.apache.org/jira/browse/PDFBOX-941 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.4.0 > Environment: java 1.6 on CentOS 64bit Linux and MacOSX 10.6 > Reporter: Liang Qu > Assignee: Andreas Lehmkühler > Fix For: 1.5.0 > > Attachments: 1010gaiyou.pdf > > Original Estimate: 24h > Remaining Estimate: 24h > > when extracting text from this pdf file, I got this exception, and the text > extracted was gibberish. > 44 [main] ERROR org.apache.pdfbox.pdmodel.font.PDFont - Error: Could not > parse predefined CMAP file for 'Adobe-Japan1-UCS2' > PDFBox 1.2.1 worked fine with the same file, I wonder why 1.4.0 could not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira