Hi kevin

 http://linguistica.uchicago.edu/

I've tried it out and it does a reasonable job - you might start by having a look at it and seeing if you can massage its output into an affix file.

thanks :-)

 You probably recall that I have web-crawled corpora and hence
 frequency lists for 200+ languages as part of the gramadoir project -
 if you get something up and running I can do some testing with these.


great !!! I'm actually thinking to the evaluation part of the result. Does any metric exists that would evaluate the quality of an affix file ? is compression ratio a good one ? Any more linguistical ones ?

 Note that the important thing here (in my view) is to get
 something *linguistically* meaningful - if the goal is to merely
 compress the word list one can just run munchlist to find candidate
affixes.

hum. Does this "munchlist" exists ?
linguistically meaningfull for a general tool not dependanding on languages will be difficult, no ?


 The real advantage of a good affix file is that once it exists one can use
 it to extract candidate word/affix pairs from a corpus automatically -
 I have code for this already (one level of affixes only for now).  So
 obviously I'll be thrilled if you get something good going.


I'll let you know about the result

One xtension, if time, we will see is form the affix file guess some new words and check if these words exists (google requesting for eg.) and teh ptopose them as new words for teh spellchecker. But this will be probably a second part of the job

Thansk again kevin

Laurent

--
Laurent Godard <[EMAIL PROTECTED]> - Ingénierie OpenOffice.org
Indesko >> http://www.indesko.com
Nuxeo CPS >> http://www.nuxeo.com - http://www.cps-project.org
Livre "Programmation OpenOffice.org", Eyrolles 2004

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to