Re: cld 0.1.0 - Clojure Language Detection

2012-03-01 Thread Ulises
Apologies for hijacking this thread, but this is probably the most
relevant place to point you to a simple web-frontend to cld I cooked
up in a couple of hours:
http://detector-de-idioma.herokuapp.com/index.html

Additionally, you can call it as a service like

  $ url 'http://detector-de-idioma.herokuapp.com/detect/tally%20ho%20he%20said!'
  
{language:en,probs:{en:0.971950892855},probs-seq:[[en,0.971950892855]]}
  $

See how you like it. I'd love to hear it if you have a use case for
such monstrosity.

U

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Robin Kraft
Awesome! I'm seeing some inconsistency though. Does anyone know why a
Bayesian classifier would produce such different results? Could it be
because of the short input text?

(lang/detect My name is joe)
[af {af 0.8571390166207665, lt 0.14285675907555712}]

(lang/detect My name is joe)
[af {af 0.857138268895934, fi 0.14285706394982348}]

(lang/detect My name is joe)
[af {af 0.7142834459768219, hr 0.14285657945126254, lt
0.14285645678764042}]



On Feb 28, 2:24 am, Lee Hinman matthew.hin...@gmail.com wrote:
 Hi all,
 I'm pleased to announce the initial 0.1.0 release of cld (Clojure
 Language Detection). CLD a tiny library wrapping language-detect[1]
 that can be used to determine the language of a particular piece of
 text very quickly. You should be able to use it from Clojars[2] with
 the following:

 [cld 0.1.0]

 Please give it a try and open any issues on the github repo[3] that
 you find. Check out the readme for the full information and usage.

 Also soliciting better names for the project than 'cld' :)

 thanks,
 Lee Hinman

 [1]:https://code.google.com/p/language-detection/
 [2]:http://clojars.org/cld
 [2]:https://github.com/dakrone/cld

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Lee Hinman
On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:

 Awesome! I'm seeing some inconsistency though. Does anyone know why a 
 Bayesian classifier would produce such different results? Could it be 
 because of the short input text? 

 (lang/detect My name is joe) 
 [af {af 0.8571390166207665, lt 0.14285675907555712}] 

 (lang/detect My name is joe) 
 [af {af 0.857138268895934, fi 0.14285706394982348}] 

 (lang/detect My name is joe) 
 [af {af 0.7142834459768219, hr 0.14285657945126254, lt 
 0.14285645678764042}] 


Yes, language-detect does a fuzzy matching-letter-frequency-count (the 
non-scientific name for it) sort of algorithm in an attempt to quickly 
determine the language, so for shorter text, it has a higher chance of 
being incorrect (because there is less letter frequency to analyze). Give 
it a try with a longer input string.

Additionally, you could adjust the :smoothing option for the string, or 
pass in a map of probabilities in as a :prior-map to coerce it one way or 
the other manually.

- Lee
 

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Michael Wood
On 29 February 2012 17:48, Lee Hinman matthew.hin...@gmail.com wrote:
 On Tuesday, February 28, 2012 6:03:26 PM UTC-7, Robin Kraft wrote:

 Awesome! I'm seeing some inconsistency though. Does anyone know why a
 Bayesian classifier would produce such different results? Could it be
 because of the short input text?

 (lang/detect My name is joe)
 [af {af 0.8571390166207665, lt 0.14285675907555712}]

As an aside, the above is indeed almost valid Afrikaans :) and would
translate to My names are joe.

-- 
Michael Wood esiot...@gmail.com

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Cedric Greevey
I'm more curious about why the output isn't even deterministic. The
same input string produced three different results.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en


Re: cld 0.1.0 - Clojure Language Detection

2012-02-29 Thread Ulises
 I'm more curious about why the output isn't even deterministic. The
 same input string produced three different results.

You can see in the FAQ
(http://code.google.com/p/language-detection/wiki/FrequentlyAskedQuestion)
that:

Langdetect uses random sampling for avoiding local noises(person
name, place name and so on), so the language detections of the same
document might differ for every time.

U

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en