Hi David,
I think you should first contact Laszlo Nemeth ([EMAIL PROTECTED]) who
has already begun working on a basic stemmer using myspell dictionaries;
it's quite easy to re-use the affix information without building any
huge database. He's doing it as a port of a Hunmorph project (search the
Web for it). However, the other task, i.e. using affix information to,
say, conjugate verbs is more complicated and would probably need some
additional tagging such as defining classes of abstraction over the same
set of affixes. This would then speed up the process of finding
corresponding prefixes and suffixes for any word.
The basic algorithm of stemming with myspell affixes in AWK is available
on Eleonora's website (I believe it's Domolki algorithm), you could
easily port it into C++. Read myspell sources, also.
Best,
Marcin
David Avsajanishvili napisał(a):
From http://development.openoffice.org/todo.html
Design and build code that will combine information on prefixes and suffixes from any language and determine its meaning (part of speech, tense, gender, plural vs single, etc, etc) and thereby allow heavily affixed languages to convert words to a known root form to be looked up in a thesaurus. Then take that a step farther to add the proper suffixes and affixes to the synonyms to create new synonyms with the correct tense, part of speech, plural vs single, gender, etc. This in turn would be integrated into both the spellcheck and the thesaurus.
--------------------------------------
Hello!
It's a good idea to build an universal flexible database containing complex information about prefixes&suffixes, words roots, semantical forms, etc. which can be filled with data of any existing language by an expert and then used in spellcheck/thesaurus checking module. The languages over the World are very different by their structure, that's why it's only possible to create a thing with basic functionality, which can be extended in future. (I think I understood your needs well...)
I would like to create (or take part in creating) such extensible system with basic functionality. It is very interesting for me, and I really can to do it.
About myself:
My name is David Avsajanishvili, I am 22, study in GTU, Faculty of IT. I'll finish my study at December, 2005 and get bachelor degree.
I am familiar with C/C++, C#, Pascal, DHTML, ActiveX programming, SQL (MSAcces & MS-SQl Server), XML, etc. A little knowledge of Java, Perl/PHP.
3-years experience in C++ (MS Visual Studio) and Pascal (Delphi) programming. 2-yeas hard-working experience in Database programming (wrote a query subsystem for study-process management database, in Delphi, a year ago). 10-months hard-working experience in Windows graphics programming (just now finished data reporting component for a local software development company, in MS C++).
Always ready to improve my knowledge and experience in any sphere!
Contact infromation:
e-mail: [EMAIL PROTECTED], [EMAIL PROTECTED]
tel: +995(93)723268
address: Republic of Georgia, Tbilisi, TEMKA, 11 m/r b.16 #24
David Avsajanishvili.
________________________________________________________
Get your free mail from @POSTA.GE at http://www.posta.ge
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]