[ https://issues.apache.org/jira/browse/TIKA-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211760#comment-13211760 ]
Christian Moen commented on TIKA-855: ------------------------------------- Thanks, James. I've linked the issues. Perhaps we can track this in TIKA-856. > Language Detection not working for Japanese and Chinese. > -------------------------------------------------------- > > Key: TIKA-855 > URL: https://issues.apache.org/jira/browse/TIKA-855 > Project: Tika > Issue Type: Bug > Components: languageidentifier > Affects Versions: 1.0 > Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java > 6 and Oracle Java 7 > Reporter: James Sullivan > Assignee: Ken Krugler > Priority: Minor > Labels: Chinese, Japanese > > I have tried Tika 1.0 language detection (java -jar tika.jar -l > .\Japanese.txt) on several Japanese files (both PDF and text files) and it > consistently returns lt (Lithuanian???) instead of ja. I also tried on a > Chinese file which similarly incorrectly returned lt. Both English language > and French language detection worked correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira