Thanks Ken. 

We are working on bringing in Text.jl and prefer at this point
to work on 1.x branch aka master. I’ve asked Trevor to take a look
at the 1.x branch and pulling your code from 2.x for tika-detect
module into 1.x. Then to look at adding text.jl from MIT-LL as a
corresponding implementation there. It’s a REST-based server that
he set up in Julia that accepts PUT requests. We should be able
to start out with Text.jl and then generalize to any REST service
that will perform language identification later.

You can see the issue from before here:

https://issues.apache.org/jira/browse/TIKA-1696


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Ken Krugler <kkrugler_li...@transpac.com>
Date: Tuesday, February 23, 2016 at 11:14 AM
To: "dev@tika.apache.org" <dev@tika.apache.org>
Cc: jpluser <chris.a.mattm...@jpl.nasa.gov>, "Ramirez, Paul M (398M)"
<paul.m.rami...@jpl.nasa.gov>
Subject: RE: Integrating Tika with MITLL Text.jl library for language
detection

>
>
>
>Hi Trevor,
>
>
>1. I assume the benchmark was using a pre-2.0 version of Tika, yes?
>
>
>It would be great to try out the current support in the 2.0 branch, as a
>comparison with what we had previously.
>
>
>Also, details on the test corpus used would be useful.
>
>
>2. I started using the ServiceLoader pattern to support dynamic loading
>of language detectors
>
>
>There's a bit more work to move the common support classes
>(LanguageWriter, etc) from the specific implementation sub-project into
>core
>
>
>Once that's done you should be able to try out directly adding your
>integration with Text.jl
>
>
>-- Ken
>
>
>________________________________________
>From: Trevor Claude Lewis
>Sent: February 23, 2016 10:55:46am PST
>To:dev@tika.apache.org
>Cc: Mattmann, Chris A (3980); Ramirez, Paul M (398M);
>kkrugler_li...@transpac.com
>Subject: Integrating Tika with MITLL Text.jl library for language
>detection
>
>
>Hi all,
>
>I am Trevor and I am a grad student at USC currently working with Prof.
>Chris Mattmann and Paul Ramirez, on integrating Tika with MIT Lincoln
>Lab’s
>Text.jl library for language detection.
>https://issues.apache.org/jira/browse/TIKA-1696
>
>Since, Text.jl is written in Julia I have created a Julia HTTP Server
>which
>accepts PUT request data and returns the language of the data as a JSON
>string.
>https://github.com/trevorlewis/csci572dr.git
>
>I have also benchmarked the results of the Julia HTTP Server to identify
>language with Tika 1.11 language detector.
>https://docs.google.com/spreadsheets/d/1cW6S2WpiN08pZ3UMVGMyQkO-fotUiUyGRe
>mCrbC1miY/edit?usp=sharing
>
>I was also looking at the work done by Ken Krugler on Tika's 2.x branch
>language detection and I was planning to fork that project and add the
>Text.jl implementation.
>https://issues.apache.org/jira/browse/TIKA-1723
>
>I wanted to gather any input and feedback on this project.
>
>
>Thanks,
>
>Trevor Lewis
>lewis...@usc.edu
>
>
>
>
>
>--------------------------
>Ken Krugler
>+1 530-210-6378
>http://www.scaleunlimited.com
>custom big data solutions & training
>Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to