Re: cld 0.1.0 - Clojure Language Detection

Lee Hinman Wed, 29 Feb 2012 07:49:06 -0800

On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:
>
> Awesome! I'm seeing some inconsistency though. Does anyone know why a 
> Bayesian classifier would produce such different results? Could it be 
> because of the short input text? 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}] 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt" 
> "0.14285645678764042"}] 
>


Yes, language-detect does a fuzzy matching-letter-frequency-count (the 
non-scientific name for it) sort of algorithm in an attempt to quickly 
determine the language, so for shorter text, it has a higher chance of 
being incorrect (because there is less letter frequency to analyze). Give 
it a try with a longer input string.

Additionally, you could adjust the :smoothing option for the string, or 
pass in a map of probabilities in as a :prior-map to coerce it one way or 
the other manually.

- Lee
 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: cld 0.1.0 - Clojure Language Detection

Reply via email to