It's fine to discuss this on tika 1696 Sent from my iPhone
> On Jul 23, 2015, at 2:19 PM, Paul Ramirez (JIRA) <j...@apache.org> wrote: > > > [ > https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639522#comment-14639522 > ] > > Paul Ramirez commented on TIKA-1696: > ------------------------------------ > > Ken, thanks for the fast feedback and references. I've not dug into this much > so it may take a couple of weeks to get something up here to test. As I dig > into this I'll update the Jira issue with more details to help drive > discussion. Also I'll look to get the MITLL guys posting here too as they > would be better able to describe the details. > > What wasn't clear on TIKA-369 is whether yalder was going to come back into > Tika. Intent here is to get to a patch integrating their code so it could be > tested in the same way that Tika's current approach was tested. Hopefully > that patch would help answer the questions above. > > They are forwarding me some research papers so I can come up to speed on this > too so as I gain knowledge I'll flush out here. > > Do you think this should instead happen on TIKA-369? > >> Language Identification with Text Processing Toolkit from MITLL >> --------------------------------------------------------------- >> >> Key: TIKA-1696 >> URL: https://issues.apache.org/jira/browse/TIKA-1696 >> Project: Tika >> Issue Type: New Feature >> Components: languageidentifier >> Reporter: Paul Ramirez >> Fix For: 1.10 >> >> >> The aim here is to extend the methods for language identification within >> text. MIT Lincoln Labs has an open source library [1] written in Julia. >> Having spoken with the MITLL guys there is a possibility that there is a >> scala version of this library which would make it easier to package in with >> Tika. >> At this point I'm not quite sure how many languages this library supports by >> default but it can be extended when provided some training data. >> [1] https://github.com/mit-nlp/Text.jl > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332)