Hi kevin
http://linguistica.uchicago.edu/
I've tried it out and it does a reasonable job - you might start
by having a look at it and seeing if you can massage its output
into an affix file.
thanks :-)
You probably recall that I have web-crawled corpora and hence
frequency lists for 200+ languages as part of the gramadoir project -
if you get something up and running I can do some testing with these.
great !!! I'm actually thinking to the evaluation part of the result.
Does any metric exists that would evaluate the quality of an affix file
? is compression ratio a good one ? Any more linguistical ones ?
Note that the important thing here (in my view) is to get
something *linguistically* meaningful - if the goal is to merely
compress the word list one can just run munchlist to find candidate
affixes.
hum. Does this "munchlist" exists ?
linguistically meaningfull for a general tool not dependanding on
languages will be difficult, no ?
The real advantage of a good affix file is that once it exists one can use
it to extract candidate word/affix pairs from a corpus automatically -
I have code for this already (one level of affixes only for now). So
obviously I'll be thrilled if you get something good going.
I'll let you know about the result
One xtension, if time, we will see is form the affix file guess some new
words and check if these words exists (google requesting for eg.) and
teh ptopose them as new words for teh spellchecker. But this will be
probably a second part of the job
Thansk again kevin
Laurent
--
Laurent Godard <[EMAIL PROTECTED]> - Ingénierie OpenOffice.org
Indesko >> http://www.indesko.com
Nuxeo CPS >> http://www.nuxeo.com - http://www.cps-project.org
Livre "Programmation OpenOffice.org", Eyrolles 2004
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]