Re: cld 0.1.0 - Clojure Language Detection
Apologies for hijacking this thread, but this is probably the most relevant place to point you to a simple web-frontend to cld I cooked up in a couple of hours: http://detector-de-idioma.herokuapp.com/index.html Additionally, you can call it as a service like $ url 'http://detector-de-idioma.herokuapp.com/detect/tally%20ho%20he%20said!' {language:en,probs:{en:0.971950892855},probs-seq:[[en,0.971950892855]]} $ See how you like it. I'd love to hear it if you have a use case for such monstrosity. U -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
Awesome! I'm seeing some inconsistency though. Does anyone know why a Bayesian classifier would produce such different results? Could it be because of the short input text? (lang/detect My name is joe) [af {af 0.8571390166207665, lt 0.14285675907555712}] (lang/detect My name is joe) [af {af 0.857138268895934, fi 0.14285706394982348}] (lang/detect My name is joe) [af {af 0.7142834459768219, hr 0.14285657945126254, lt 0.14285645678764042}] On Feb 28, 2:24 am, Lee Hinman matthew.hin...@gmail.com wrote: Hi all, I'm pleased to announce the initial 0.1.0 release of cld (Clojure Language Detection). CLD a tiny library wrapping language-detect[1] that can be used to determine the language of a particular piece of text very quickly. You should be able to use it from Clojars[2] with the following: [cld 0.1.0] Please give it a try and open any issues on the github repo[3] that you find. Check out the readme for the full information and usage. Also soliciting better names for the project than 'cld' :) thanks, Lee Hinman [1]:https://code.google.com/p/language-detection/ [2]:http://clojars.org/cld [2]:https://github.com/dakrone/cld -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote: Awesome! I'm seeing some inconsistency though. Does anyone know why a Bayesian classifier would produce such different results? Could it be because of the short input text? (lang/detect My name is joe) [af {af 0.8571390166207665, lt 0.14285675907555712}] (lang/detect My name is joe) [af {af 0.857138268895934, fi 0.14285706394982348}] (lang/detect My name is joe) [af {af 0.7142834459768219, hr 0.14285657945126254, lt 0.14285645678764042}] Yes, language-detect does a fuzzy matching-letter-frequency-count (the non-scientific name for it) sort of algorithm in an attempt to quickly determine the language, so for shorter text, it has a higher chance of being incorrect (because there is less letter frequency to analyze). Give it a try with a longer input string. Additionally, you could adjust the :smoothing option for the string, or pass in a map of probabilities in as a :prior-map to coerce it one way or the other manually. - Lee -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
On 29 February 2012 17:48, Lee Hinman matthew.hin...@gmail.com wrote: On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote: Awesome! I'm seeing some inconsistency though. Does anyone know why a Bayesian classifier would produce such different results? Could it be because of the short input text? (lang/detect My name is joe) [af {af 0.8571390166207665, lt 0.14285675907555712}] As an aside, the above is indeed almost valid Afrikaans :) and would translate to My names are joe. -- Michael Wood esiot...@gmail.com -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
I'm more curious about why the output isn't even deterministic. The same input string produced three different results. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
I'm more curious about why the output isn't even deterministic. The same input string produced three different results. You can see in the FAQ (http://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion) that: Langdetect uses random sampling for avoiding local noises(person name, place name and so on), so the language detections of the same document might differ for every time. U -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en