Thanks Oleg ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: Oleg Tikhonov <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, July 29, 2015 at 12:01 AM To: "[email protected]" <[email protected]> Subject: Re: Bayesian N-Gram Language Detection >+1 !!! >My two cents. >Please also add ability to change/retrain/tote language profiles. > >Thanks !!! >BR, >Oleg > >On Wed, Jul 29, 2015 at 3:59 AM, Mattmann, Chris A (3980) < >[email protected]> wrote: > >> Cool. Well with this one I found, along with language-detector, >> along with Ramirez and the work with Joe Campbell’s group at MIT-LL >> and the Julia stuff, I for one am going to take the step to make it >> pluggable. >> >> I’ll try and take this on over the next week. I’ll use a ServiceLoader >> approach similar to Translators, Detectors, Parsers, etc. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> -----Original Message----- >> From: Ken Krugler <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Tuesday, July 28, 2015 at 5:39 PM >> To: "[email protected]" <[email protected]> >> Subject: RE: Bayesian N-Gram Language Detection >> >> >I think switching to language-detector is a reasonable first step (more >> >languages, faster, better accuracy), after which we can evaluate the >>need >> >to make it pluggable. >> > >> >There were some code & resource packaging issues with the original >> >project, but the fork I've been trying out seems much better. >> > >> >See https://github.com/optimaize/language-detector >> > >> >Still ALv2, and already in the Maven central repo. >> > >> >-- Ken >> > >> >> From: Mattmann, Chris A (3980) >> >> Sent: July 28, 2015 5:30:00pm PDT >> >> To: [email protected] >> >> Subject: Bayesian N-Gram Language Detection >> >> >> >> FYI the code is ALv2: >> >> >> >> https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md >> >> >> >> >> >> I’m going to test this out and see how it compares with our own. >> >> Maybe we need to make the Language Detector pluggable too. >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: [email protected] >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> > >> >-------------------------- >> >Ken Krugler >> >+1 530-210-6378 >> >http://www.scaleunlimited.com >> >custom big data solutions & training >> >Hadoop, Cascading, Cassandra & Solr >> > >> > >> > >> > >> > >> >>
