[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130360#comment-15130360 ]
Tim Allison commented on TIKA-1723: ----------------------------------- Come on over to the 2.x branch, the water is fine. :) Plenty of freedom to break things there. I still don't have a good solution, though, to the complexity you raise above. > Integrate language-detector into Tika > ------------------------------------- > > Key: TIKA-1723 > URL: https://issues.apache.org/jira/browse/TIKA-1723 > Project: Tika > Issue Type: Improvement > Components: languageidentifier > Affects Versions: 1.11 > Reporter: Ken Krugler > Assignee: Ken Krugler > Priority: Minor > Attachments: TIKA-1723-2.patch, TIKA-1723-3.patch, TIKA-1723.patch, > TIKA-1723v2.patch > > > The language-detector project at > https://github.com/optimaize/language-detector is faster, has more languages > (70 vs 13) and better accuracy than the built-in language detector. > This is a stab at integrating it, with some initial findings. There are a > number of issues this raises, especially if [~chrismattmann] moves forward > with turning language detection into a pluggable extension point. > I'll add comments with results below. -- This message was sent by Atlassian JIRA (v6.3.4#6332)