Re: cld 0.1.0 - Clojure Language Detection
Apologies for hijacking this thread, but this is probably the most relevant place to point you to a simple web-frontend to cld I cooked up in a couple of hours: http://detector-de-idioma.herokuapp.com/index.html Additionally, you can call it as a service like $ url 'http://detector-de-idioma.herokuapp.com/detect/tally%20ho%20he%20said!' {"language":"en","probs":{"en":0.971950892855},"probs-seq":[["en",0.971950892855]]} $ See how you like it. I'd love to hear it if you have a use case for such monstrosity. U -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
> I'm more curious about why the output isn't even deterministic. The > same input string produced three different results. You can see in the FAQ (http://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion) that: "Langdetect uses random sampling for avoiding local noises(person name, place name and so on), so the language detections of the same document might differ for every time." U -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
I'm more curious about why the output isn't even deterministic. The same input string produced three different results. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
On 29 February 2012 17:48, Lee Hinman wrote: > On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote: >> >> Awesome! I'm seeing some inconsistency though. Does anyone know why a >> Bayesian classifier would produce such different results? Could it be >> because of the short input text? >> >> (lang/detect "My name is joe") >> ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] As an aside, the above is indeed almost valid Afrikaans :) and would translate to "My names are joe". -- Michael Wood -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote: > > Awesome! I'm seeing some inconsistency though. Does anyone know why a > Bayesian classifier would produce such different results? Could it be > because of the short input text? > > (lang/detect "My name is joe") > ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] > > (lang/detect "My name is joe") > ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}] > > (lang/detect "My name is joe") > ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt" > "0.14285645678764042"}] > Yes, language-detect does a fuzzy matching-letter-frequency-count (the non-scientific name for it) sort of algorithm in an attempt to quickly determine the language, so for shorter text, it has a higher chance of being incorrect (because there is less letter frequency to analyze). Give it a try with a longer input string. Additionally, you could adjust the :smoothing option for the string, or pass in a map of probabilities in as a :prior-map to coerce it one way or the other manually. - Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en
Re: cld 0.1.0 - Clojure Language Detection
Awesome! I'm seeing some inconsistency though. Does anyone know why a Bayesian classifier would produce such different results? Could it be because of the short input text? (lang/detect "My name is joe") ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] (lang/detect "My name is joe") ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}] (lang/detect "My name is joe") ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt" "0.14285645678764042"}] On Feb 28, 2:24 am, Lee Hinman wrote: > Hi all, > I'm pleased to announce the initial 0.1.0 release of cld (Clojure > Language Detection). CLD a tiny library wrapping language-detect[1] > that can be used to determine the language of a particular piece of > text very quickly. You should be able to use it from Clojars[2] with > the following: > > [cld "0.1.0"] > > Please give it a try and open any issues on the github repo[3] that > you find. Check out the readme for the full information and usage. > > Also soliciting better names for the project than 'cld' :) > > thanks, > Lee Hinman > > [1]:https://code.google.com/p/language-detection/ > [2]:http://clojars.org/cld > [2]:https://github.com/dakrone/cld -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en