[ 
https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13198521#comment-13198521
 ] 

James Sullivan commented on TIKA-855:
-------------------------------------

If it is just a missing language profile issue let me know what is needed as at 
least for Japanese I am aware of number of large publicly available corpora 
that might be suitable and may be able to help generate the profiles. However, 
it sounds like there might be more to it than just generating the profile...I 
have added this as feature request TIKA-856.
                
> Language Detection not working for Japanese and Chinese.
> --------------------------------------------------------
>
>                 Key: TIKA-855
>                 URL: https://issues.apache.org/jira/browse/TIKA-855
>             Project: Tika
>          Issue Type: Bug
>          Components: languageidentifier
>    Affects Versions: 1.0
>         Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 
> 6 and Oracle Java 7
>            Reporter: James Sullivan
>            Assignee: Ken Krugler
>            Priority: Minor
>              Labels: Chinese, Japanese
>
> I have tried Tika 1.0 language detection (java -jar tika.jar -l 
> .\Japanese.txt) on several Japanese files (both PDF and text files) and it 
> consistently returns lt (Lithuanian???) instead of ja. I also tried on a 
> Chinese file which similarly incorrectly returned lt. Both English language 
> and French language detection worked correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to