I think switching to language-detector is a reasonable first step (more 
languages, faster, better accuracy), after which we can evaluate the need to 
make it pluggable.

There were some code & resource packaging issues with the original project, but 
the fork I've been trying out seems much better.

See https://github.com/optimaize/language-detector

Still ALv2, and already in the Maven central repo.

-- Ken

> From: Mattmann, Chris A (3980)
> Sent: July 28, 2015 5:30:00pm PDT
> To: [email protected]
> Subject: Bayesian N-Gram Language Detection
> 
> FYI the code is ALv2:
> 
> https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md
> 
> 
> I’m going to test this out and see how it compares with our own.
> Maybe we need to make the Language Detector pluggable too.
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to