[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

Jukka Zitting (JIRA) Fri, 19 Aug 2011 01:39:03 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087610#comment-13087610
 ]


Jukka Zitting commented on TIKA-683:
------------------------------------

> Just in case it can't be done with subclassing, anybody know what the 
> licensing
> restrictions on the JDK classes is? (mainly RTFEditorKit, RTFReader ).

They should be available under GPLv2 from the OpenJDK project.

And it actually looks like Apache Harmony added an initial ALv2-licensed RTF 
parser
in HARMONY-5903. I haven't tried that code yet.


> RTF Parser issues with non european characters
> ----------------------------------------------
>
>                 Key: TIKA-683
>                 URL: https://issues.apache.org/jira/browse/TIKA-683
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Chris A. Mattmann
>         Attachments: TIKA-683-unicode-testcase.patch, TIKA-683.patch, 
> testRTFJapanese.rtf, testUnicodeUCNControlWordCharacterDoubling.rtf
>
>
> As reported on user@ in "non-West European languages support":
>   
> http://mail-archives.apache.org/mod_mbox/tika-user/201107.mbox/%3cof0c0a3275.da7810e9-onc22578cc.0051eede-c22578cc.00525...@il.ibm.com%3E
> The RTF Parser seems to be doubling up some non-european characters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-683) RTF Parser issues with non european characters

Reply via email to