Paul Ramirez created TIKA-1696:
----------------------------------

             Summary: Language Identification with Text Processing Toolkit from 
MITLL
                 Key: TIKA-1696
                 URL: https://issues.apache.org/jira/browse/TIKA-1696
             Project: Tika
          Issue Type: New Feature
          Components: languageidentifier
            Reporter: Paul Ramirez
             Fix For: 1.10


The aim here is to extend the methods for language identification within text. 
MIT Lincoln Labs has an open source library [1] written in Julia. Having spoken 
 with the MITLL guys there is a possibility that there is a scala version of 
this library which would make it easier to package in with Tika. 

At this point I'm not quite sure how many languages this library supports by 
default but it can be extended when provided some training data.

[1] https://github.com/mit-nlp/Text.jl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to