Just so I get this right is it then a one to one mapping with LanguageProfile and training data? The code I'm looking at now allows one to train on multiple languages.
Thanks, Pual ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Paul Ramirez, M.S. Technical Group Supervisor Computer Science for Data Intensive Applications (398M) Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 158-264, Mailstop: 158-242 Email: [email protected]<mailto:[email protected]> Office: 818-354-1015 Cell: 818-395-8194 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Aug 3, 2015, at 7:37 PM, "Mattmann, Chris A (3980)" <[email protected]<mailto:[email protected]>> wrote: Thanks Oleg ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected]<mailto:[email protected]> WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Oleg Tikhonov <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Wednesday, July 29, 2015 at 12:01 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Bayesian N-Gram Language Detection +1 !!! My two cents. Please also add ability to change/retrain/tote language profiles. Thanks !!! BR, Oleg On Wed, Jul 29, 2015 at 3:59 AM, Mattmann, Chris A (3980) < [email protected]<mailto:[email protected]>> wrote: Cool. Well with this one I found, along with language-detector, along with Ramirez and the work with Joe Campbell’s group at MIT-LL and the Julia stuff, I for one am going to take the step to make it pluggable. I’ll try and take this on over the next week. I’ll use a ServiceLoader approach similar to Translators, Detectors, Parsers, etc. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected]<mailto:[email protected]> WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Ken Krugler <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, July 28, 2015 at 5:39 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: RE: Bayesian N-Gram Language Detection I think switching to language-detector is a reasonable first step (more languages, faster, better accuracy), after which we can evaluate the need to make it pluggable. There were some code & resource packaging issues with the original project, but the fork I've been trying out seems much better. See https://github.com/optimaize/language-detector Still ALv2, and already in the Maven central repo. -- Ken From: Mattmann, Chris A (3980) Sent: July 28, 2015 5:30:00pm PDT To: [email protected]<mailto:[email protected]> Subject: Bayesian N-Gram Language Detection FYI the code is ALv2: https://github.com/shuyo/language-detection/blob/wiki/ProjectHome.md I’m going to test this out and see how it compares with our own. Maybe we need to make the Language Detector pluggable too. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
