On Fri, May 03, 2002 at 12:17:10PM +0000, [EMAIL PROTECTED] said:
> Recently I've been looking into a problem we've
> been having with a system which indexes business
> names using stopword lists(The list of common
> words not to include) and porter-stemming (Which
> reduces a word to it's common base, something you
> seem to have done incidentally by change
> 'Drinking' to 'Drink').

Sounds like what I want to do. Incidentally, what are you using for the
stopword list and the porter-steeming? Existing code and lists or stuff
you've done your self?

> A real-world example of where this broke down for
> us a little was when dealing with the business
> name 'One To One'.  Removing stopwords erased the
> name entirely.  Oops.

Yes. I can see where that would be a problem :) Editor approval is
definitely going to be required.

Cheers

Simon

--
: omnipotence for dummies

Reply via email to