Re: [jira] Commented: (PDFBOX-420) Japanese Characters are garbled.

Brian Carrier Mon, 30 Mar 2009 07:45:01 -0700

Hello,

The regression tests have been broken for over 2 weeks and this isgreatly impacting our ability to check in some bug fixes because weneed to regenerate new regression test files (for example, we fixedsome spacing issues that can be seen in the regression tests). Ithink this patch needs to be reverted until it either passes theregression tests or only has failures that are because the patchfixed some bugs that existed in the regression tests (in which casewe can simply fix the regression tests).

Unless I hear otherwise, I will plan to revert the patch tomorrow(Tuesday).


thanks,
brian



On Mar 26, 2009, at 3:03 AM, Takashi Komatsubara (JIRA) wrote:

[ https://issues.apache.org/jira/browse/PDFBOX-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689376#action_12689376 ]
Takashi Komatsubara commented on PDFBOX-420:
--------------------------------------------

Andreas,

The mapping "Identity-H" to "JIS" is no problem, itself.
Though, I've confirmed that there are wrongly extracted charactersin the output txt file which was dynamically created druing calling"ant testextract" command.
The characteter is  ™, for example.
( This " ™ " character cannot input from Japanese keybord withnormal operation. But we can copy/paste it )
Let me take a look further.
Japanese Characters are garbled.
--------------------------------

                Key: PDFBOX-420
                URL: https://issues.apache.org/jira/browse/PDFBOX-420
            Project: PDFBox
         Issue Type: Bug
         Components: Text extraction
   Affects Versions: 0.8.0-incubator
           Reporter: Takashi Komatsubara
           Priority: Critical
Attachments: supportJapanese-fontbox.patch,supportJapanese.patch, TestFilesForJapaneseGarbledIssue.zip,textextract._20090326_01.zip
The extracted Japanese characters are completely garbled.
This issue is very critical for Japanese users.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (PDFBOX-420) Japanese Characters are garbled.

Reply via email to