I hadn't entered an issue on the tika list as of yet but in the near future 
MIT-LL will also have language detection for video and audio streams. Chris if 
you're already going to make this pluggable this may be something to consider.

--Paul

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Paul Ramirez, M.S.
Technical Group Supervisor
Computer Science for Data Intensive Applications (398M)
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 158-264, Mailstop: 158-242
Email: [email protected]<mailto:[email protected]>
Office: 818-354-1015
Cell: 818-395-8194
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Jul 28, 2015, at 5:59 PM, "Mattmann, Chris A (3980)" 
<[email protected]<mailto:[email protected]>>
 wrote:

Cool. Well with this one I found, along with language-detector,
along with Ramirez and the work with Joe Campbell’s group at MIT-LL
and the Julia stuff, I for one am going to take the step to make it
pluggable.

I’ll try and take this on over the next week. I’ll use a ServiceLoader
approach similar to Translators, Detectors, Parsers, etc.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]<mailto:[email protected]>
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: Ken Krugler 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, July 28, 2015 at 5:39 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: RE: Bayesian N-Gram Language Detection

I think switching to language-detector is a reasonable first step (more
languages, faster, better accuracy), after which we can evaluate the need
to make it pluggable.

There were some code & resource packaging issues with the original
project, but the fork I've been trying out seems much better.

See https://github.com/optimaize/language-detector

Still ALv2, and already in the Maven central repo.

-- Ken

From: Mattmann, Chris A (3980)
Sent: July 28, 2015 5:30:00pm PDT
To: [email protected]<mailto:[email protected]>
Subject: Bayesian N-Gram Language Detection

FYI the code is ALv2:

https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md


I’m going to test this out and see how it compares with our own.
Maybe we need to make the Language Detector pluggable too.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







Reply via email to