According to Alexander I. Lebedev:
> I checked the original and extended English word lists in HTDig and found
> in them, resp., 2756 and 3787 words that have more than one root (the
> numbers may be even more as I didn't transform uppercase letters to
> lowercase ones).
> 
> Moreover, I discovered IMHO odd behaviour of I and U flags that produce
> forms like:
>       wanted -- unwanted,
>       expensive -- inexpensive.
> I guess many users would like to exclude such forms that produce
> doubtful results.  I think it can be easily done using simple shell
> script that excludes all the lines, in which /I and /U are only flags,
> and remove these flags in lines, which have other flags (it can be
> easily done using sed).  If you find my idea good, I'll send you the
> script.

The endings algorithm ignores the prefixes table in the affixes file,
and only looks at rules in the suffixes table.  As such, it ignores the
A, I and U flags.  (Isn't that why it's called endings?)  I tried wanted
and expensive in htsearch, and it didn't search for their antonyms.

> If anyone wants, I could send the lists of duplicate word forms
> for analyzing.

I'd be curious to see it.  I did notice that the word2root.db file was
slightly bigger after I patched htfuzzy/EndingsDB.cc, implying that some
words in english.0 would have more than one root.  I'm surprised it's
over 2000!  What is the extended English word list?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to