Citēju "Kevin B. Hendricks" <[EMAIL PROTECTED]>:

> > At the moment i am doing so: munch with full key asset, if the result
> > looks
> > awful, remove pfxes and re-munch again. This helps. Of course, that
> > does not
> > work if you munch full list of words - you have to relay on the good
> > will of
> > munch.
....
>
> Many people with affix heavy languages try to skip the generation of
> the "working set" and use "unmunch" and their .aff info to try to build
> a working set up.  You can do this BUT then you must remove all bad
> words from the working set (pass it through another known good spell
> checker or manually check it) and then run "munch" on the newly
> shortened "working set" to create a final, good .dic file.

yes, that was the way i went. My possible mistake was that i choosed the
grammatical approach, but at the moment it seems to be the best because of the
large amount of forms (for example - current 800k dic file generates ~45 megs
of "working set"). I even did the "manual filtering" of prefixed base forms.
The help of comercially available spellchecker proved to be the waste of time -
it recognized so many completely crazy forms as good ones that i abandoned it.

As for the root words and roots of words - my language does not use the word
without suffix (except some special cases). This seems to be the flounderstone
for munch in case of prefixes - it tries to get the shortest possible form
resulting in many entries with equal lenght roots having different pfx keys
(grouping pfxes by its lenght).

Janis

--
Jancs
Laps Cileecish

Veel 292 meeneshi liidz pensijai...

http://openoffice-lv.sourceforge.net
http://tehvi.dv.lv
***

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to