[ 
https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689132#action_12689132
 ] 

Andreas Lehmkühler commented on PDFBOX-420:
-------------------------------------------

As far as a understand the whole encoding stuff the issue comes up every time 
truetype-CID-fonts are used. Whenever these kind of fonts is used "Identity-H" 
is used as encoding. The patch maps these encoding to the characterset  "JIS" 
which stands for a ISO-2022-JP, a japanese mapping (see 
org.apache.pdfbox.encoding.conversion.CJKEncodings.java).
So finally I don't know where to find the solution. Is it wrong to simply map 
"Identity-H" to "JIS" or is the reason for this problem the missing support for 
CID-fonts.

Any suggestions or hints for solving this issue? 

> Japanese Characters are garbled.
> --------------------------------
>
>                 Key: PDFBOX-420
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-420
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 0.8.0-incubator
>            Reporter: Takashi Komatsubara
>            Priority: Critical
>         Attachments: supportJapanese-fontbox.patch, supportJapanese.patch, 
> TestFilesForJapaneseGarbledIssue.zip
>
>
> The extracted Japanese characters are completely garbled.
> This issue is very critical for Japanese users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to