I've figured out a temp workaround for the problem/feature of words that
appear in more than 50% of records in a fulltext index being considered
stopwords. I just added as many dummy records as there are real records
in the table. A fulltext search will now not disregard any words based
on their frequency.

For performance I added a column called dummy with a flag set indicating
if the record is real or dummy. I added an index on the dummy column and
include a 'where dummy=1' clause in my SQL when doing fulltext searches.
I also have a cron job that runs a report every 20 minutes that makes
sure that 51% of the database is populated with dummy records. (*yuck!*)

Clumsy, yet effective. If anyone has a better solution out there, I
would very much like to hear from you.

I agree with your logic of words that occur more frequently have a
lesser weight - it makes alot of natural language sense. But there
should be a way to either disable the '50% occurence = zero weight'
setting or perhaps disable word weighting altogether for small datasets.

kind regards,

Mark.


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to