Re: [lingu-dev] Hunspell affix files capabilities

Laurent Godard Sat, 18 Mar 2006 01:26:38 -0800

Hi kevin

 http://linguistica.uchicago.edu/
I've tried it out and it does a reasonable job - you might startby having a look at it and seeing if you can massage its outputinto an affix file.


thanks :-)

 You probably recall that I have web-crawled corpora and hence
 frequency lists for 200+ languages as part of the gramadoir project -
 if you get something up and running I can do some testing with these.

great !!! I'm actually thinking to the evaluation part of the result.Does any metric exists that would evaluate the quality of an affix file? is compression ratio a good one ? Any more linguistical ones ?

 Note that the important thing here (in my view) is to get
 something *linguistically* meaningful - if the goal is to merely
 compress the word list one can just run munchlist to find candidate

affixes.


hum. Does this "munchlist" exists ?

linguistically meaningfull for a general tool not dependanding onlanguages will be difficult, no ?


 The real advantage of a good affix file is that once it exists one can use
 it to extract candidate word/affix pairs from a corpus automatically -
 I have code for this already (one level of affixes only for now).  So
 obviously I'll be thrilled if you get something good going.


I'll let you know about the result

One xtension, if time, we will see is form the affix file guess some newwords and check if these words exists (google requesting for eg.) andteh ptopose them as new words for teh spellchecker. But this will beprobably a second part of the job


Thansk again kevin

Laurent

--
Laurent Godard <[EMAIL PROTECTED]> - Ingénierie OpenOffice.org
Indesko >> http://www.indesko.com
Nuxeo CPS >> http://www.nuxeo.com - http://www.cps-project.org
Livre "Programmation OpenOffice.org", Eyrolles 2004

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [lingu-dev] Hunspell affix files capabilities

Reply via email to