Re: [jira] [Commented] (TIKA-855) Language Detection not working for Japanese and Chinese.

Oleg Tikhonov Wed, 01 Feb 2012 20:18:43 -0800

For Chinese we need to create/get two profiles: Chinese Traditional and
Chinese Simplified.


Oleg

On Thu, Feb 2, 2012 at 6:13 AM, James Sullivan (Commented) (JIRA) <
j...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198521#comment-13198521]
>
> James Sullivan commented on TIKA-855:
> -------------------------------------
>
> If it is just a missing language profile issue let me know what is needed
> as at least for Japanese I am aware of number of large publicly available
> corpora that might be suitable and may be able to help generate the
> profiles. However, it sounds like there might be more to it than just
> generating the profile...I have added this as feature request TIKA-856.
>
> > Language Detection not working for Japanese and Chinese.
> > --------------------------------------------------------
> >
> >                 Key: TIKA-855
> >                 URL: https://issues.apache.org/jira/browse/TIKA-855
> >             Project: Tika
> >          Issue Type: Bug
> >          Components: languageidentifier
> >    Affects Versions: 1.0
> >         Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun
> Java 6 and Oracle Java 7
> >            Reporter: James Sullivan
> >            Assignee: Ken Krugler
> >            Priority: Minor
> >              Labels: Chinese, Japanese
> >
> > I have tried Tika 1.0 language detection (java -jar tika.jar -l
> .\Japanese.txt) on several Japanese files (both PDF and text files) and it
> consistently returns lt (Lithuanian???) instead of ja. I also tried on a
> Chinese file which similarly incorrectly returned lt. Both English language
> and French language detection worked correctly.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

Re: [jira] [Commented] (TIKA-855) Language Detection not working for Japanese and Chinese.

Reply via email to