Citēju "Kevin B. Hendricks" <[EMAIL PROTECTED]>: > > At the moment i am doing so: munch with full key asset, if the result > > looks > > awful, remove pfxes and re-munch again. This helps. Of course, that > > does not > > work if you munch full list of words - you have to relay on the good > > will of > > munch. .... > > Many people with affix heavy languages try to skip the generation of > the "working set" and use "unmunch" and their .aff info to try to build > a working set up. You can do this BUT then you must remove all bad > words from the working set (pass it through another known good spell > checker or manually check it) and then run "munch" on the newly > shortened "working set" to create a final, good .dic file.
yes, that was the way i went. My possible mistake was that i choosed the grammatical approach, but at the moment it seems to be the best because of the large amount of forms (for example - current 800k dic file generates ~45 megs of "working set"). I even did the "manual filtering" of prefixed base forms. The help of comercially available spellchecker proved to be the waste of time - it recognized so many completely crazy forms as good ones that i abandoned it. As for the root words and roots of words - my language does not use the word without suffix (except some special cases). This seems to be the flounderstone for munch in case of prefixes - it tries to get the shortest possible form resulting in many entries with equal lenght roots having different pfx keys (grouping pfxes by its lenght). Janis -- Jancs Laps Cileecish Veel 292 meeneshi liidz pensijai... http://openoffice-lv.sourceforge.net http://tehvi.dv.lv *** --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
