Re: cld 0.1.0 - Clojure Language Detection

2012-03-01 Thread Ulises
Apologies for hijacking this thread, but this is probably the most
relevant place to point you to a simple web-frontend to cld I cooked
up in a couple of hours:
http://detector-de-idioma.herokuapp.com/index.html

Additionally, you can call it as a service like

  $ url 'http://detector-de-idioma.herokuapp.com/detect/tally%20ho%20he%20said!'
  
{"language":"en","probs":{"en":0.971950892855},"probs-seq":[["en",0.971950892855]]}
  $

See how you like it. I'd love to hear it if you have a use case for
such monstrosity.

U

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Ulises
> I'm more curious about why the output isn't even deterministic. The
> same input string produced three different results.

You can see in the FAQ
(http://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion)
that:

"Langdetect uses random sampling for avoiding local noises(person
name, place name and so on), so the language detections of the same
document might differ for every time."

U

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Cedric Greevey
I'm more curious about why the output isn't even deterministic. The
same input string produced three different results.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Michael Wood
On 29 February 2012 17:48, Lee Hinman  wrote:
> On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:
>>
>> Awesome! I'm seeing some inconsistency though. Does anyone know why a
>> Bayesian classifier would produce such different results? Could it be
>> because of the short input text?
>>
>> (lang/detect "My name is joe")
>> ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}]

As an aside, the above is indeed almost valid Afrikaans :) and would
translate to "My names are joe".

-- 
Michael Wood 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Lee Hinman
On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:
>
> Awesome! I'm seeing some inconsistency though. Does anyone know why a 
> Bayesian classifier would produce such different results? Could it be 
> because of the short input text? 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}] 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}] 
>
> (lang/detect "My name is joe") 
> ["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt" 
> "0.14285645678764042"}] 
>

Yes, language-detect does a fuzzy matching-letter-frequency-count (the 
non-scientific name for it) sort of algorithm in an attempt to quickly 
determine the language, so for shorter text, it has a higher chance of 
being incorrect (because there is less letter frequency to analyze). Give 
it a try with a longer input string.

Additionally, you could adjust the :smoothing option for the string, or 
pass in a map of probabilities in as a :prior-map to coerce it one way or 
the other manually.

- Lee
 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Robin Kraft
Awesome! I'm seeing some inconsistency though. Does anyone know why a
Bayesian classifier would produce such different results? Could it be
because of the short input text?

(lang/detect "My name is joe")
["af" {"af" "0.8571390166207665", "lt" "0.14285675907555712"}]

(lang/detect "My name is joe")
["af" {"af" "0.857138268895934", "fi" "0.14285706394982348"}]

(lang/detect "My name is joe")
["af" {"af" "0.7142834459768219", "hr" "0.14285657945126254", "lt"
"0.14285645678764042"}]



On Feb 28, 2:24 am, Lee Hinman  wrote:
> Hi all,
> I'm pleased to announce the initial 0.1.0 release of cld (Clojure
> Language Detection). CLD a tiny library wrapping language-detect[1]
> that can be used to determine the language of a particular piece of
> text very quickly. You should be able to use it from Clojars[2] with
> the following:
>
> [cld "0.1.0"]
>
> Please give it a try and open any issues on the github repo[3] that
> you find. Check out the readme for the full information and usage.
>
> Also soliciting better names for the project than 'cld' :)
>
> thanks,
> Lee Hinman
>
> [1]:https://code.google.com/p/language-detection/
> [2]:http://clojars.org/cld
> [2]:https://github.com/dakrone/cld

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en