On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote: > > Awesome! I'm seeing some inconsistency though. Does anyone know why a > Bayesian classifier would produce such different results? Could it be > because of the short input text? > > (lang/detect "My name is joe") > ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] > > (lang/detect "My name is joe") > ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}] > > (lang/detect "My name is joe") > ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt" > "0.14285645678764042"}] >
Yes, language-detect does a fuzzy matching-letter-frequency-count (the non-scientific name for it) sort of algorithm in an attempt to quickly determine the language, so for shorter text, it has a higher chance of being incorrect (because there is less letter frequency to analyze). Give it a try with a longer input string. Additionally, you could adjust the :smoothing option for the string, or pass in a map of probabilities in as a :prior-map to coerce it one way or the other manually. - Lee -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en