Hi all,

I am Trevor and I am a grad student at USC currently working with Prof.
Chris Mattmann and Paul Ramirez, on integrating Tika with MIT Lincoln Lab’s
Text.jl library for language detection.
https://issues.apache.org/jira/browse/TIKA-1696

Since, Text.jl is written in Julia I have created a Julia HTTP Server which
accepts PUT request data and returns the language of the data as a JSON
string.
https://github.com/trevorlewis/csci572dr.git

I have also benchmarked the results of the Julia HTTP Server to identify
language with Tika 1.11 language detector.
https://docs.google.com/spreadsheets/d/1cW6S2WpiN08pZ3UMVGMyQkO-fotUiUyGRemCrbC1miY/edit?usp=sharing

I was also looking at the work done by Ken Krugler on Tika's 2.x branch
language detection and I was planning to fork that project and add the
Text.jl implementation.
https://issues.apache.org/jira/browse/TIKA-1723

I wanted to gather any input and feedback on this project.


Thanks,

Trevor Lewis
lewis...@usc.edu

Reply via email to