I think switching to language-detector is a reasonable first step (more languages, faster, better accuracy), after which we can evaluate the need to make it pluggable.
There were some code & resource packaging issues with the original project, but the fork I've been trying out seems much better. See https://github.com/optimaize/language-detector Still ALv2, and already in the Maven central repo. -- Ken > From: Mattmann, Chris A (3980) > Sent: July 28, 2015 5:30:00pm PDT > To: [email protected] > Subject: Bayesian N-Gram Language Detection > > FYI the code is ALv2: > > https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md > > > I’m going to test this out and see how it compares with our own. > Maybe we need to make the Language Detector pluggable too. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
