Morus Walter wrote:
Owen Densmore writes:


1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) apparently produces non-word stems .. i.e. not really human readable. (Example: generate, generates, generated, generating -> generat) Although in typical queries this is not important because the result of the search is a document list, it *would* be important if we use the stems within a graphical navigation interface.
So the question is: Is there a way to have the stemmer produce english
base forms of the words being stemmed?



rule based stemmers such as porter/snowball cannot do that. But there are (commercial) dictionary based tools that can. E.g. the canoo lemmatizer. You might also have a look at egothors stemmer, that are word list based.

Egothor stemmers are algorithmic, they only use word lists for training. Stems produced by them are usually closer to lemmas than in e.g. Porter's stemmer, but there is a significant amount of stems like in the example above.





-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to