According to Alexander I. Lebedev:
> I checked the original and extended English word lists in HTDig and found
> in them, resp., 2756 and 3787 words that have more than one root (the
> numbers may be even more as I didn't transform uppercase letters to
> lowercase ones).
>
> Moreover, I discovered IMHO odd behaviour of I and U flags that produce
> forms like:
> wanted -- unwanted,
> expensive -- inexpensive.
> I guess many users would like to exclude such forms that produce
> doubtful results. I think it can be easily done using simple shell
> script that excludes all the lines, in which /I and /U are only flags,
> and remove these flags in lines, which have other flags (it can be
> easily done using sed). If you find my idea good, I'll send you the
> script.
The endings algorithm ignores the prefixes table in the affixes file,
and only looks at rules in the suffixes table. As such, it ignores the
A, I and U flags. (Isn't that why it's called endings?) I tried wanted
and expensive in htsearch, and it didn't search for their antonyms.
> If anyone wants, I could send the lists of duplicate word forms
> for analyzing.
I'd be curious to see it. I did notice that the word2root.db file was
slightly bigger after I patched htfuzzy/EndingsDB.cc, implying that some
words in english.0 would have more than one root. I'm surprised it's
over 2000! What is the extended English word list?
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html