Brian Mansell <[EMAIL PROTECTED]> wrote on 05/25/2005 03:09:03 PM:
> Scott -
>
> Check this excerpt out (
> http://dev.mysql.com/doc/mysql/en/fulltext-search.html ) from the MySQL
> Documentation. I hope it helps!
>
> --bemansell
>
> ...
>
> "Every correct word in the collection and in the query is weighted
according
> to its significance in the collection or query. This way, a word that is
> present in many documents has a lower weight (and may even have a zero
> weight), because it has lower semantic value in this particular
collection.
> Conversely, if the word is rare, it receives a higher weight. The
weights of
> the words are then combined to compute the relevance of the row.
>
> Such a technique works best with large collections (in fact, it was
> carefully tuned this way). For very small tables, word distribution does
not
> adequately reflect their semantic value, and this model may sometimes
> produce bizarre results. For example, although the word ``MySQL'' is
present
> in every row of the articles table, a search for the word produces no
> results:
>
> mysql> SELECT * FROM articles
> -> WHERE MATCH (title,body) AGAINST ('MySQL');
> Empty set (0.00 sec)
>
> The search result is empty because the word ``MySQL'' is present in at
> least 50% of the rows. As such, it is effectively treated as a stopword.
For
> large datasets, this is the most desirable behavior---a natural language
> query should not return every second row from a 1GB table. For small
> datasets, it may be less desirable.
>
> A word that matches half of rows in a table is less likely to locate
> relevant documents. In fact, it most likely finds plenty of irrelevant
> documents. We all know this happens far too often when we are trying to
find
> something on the Internet with a search engine. It is with this
reasoning
> that rows containing the word are assigned a low semantic value for *the
> particular dataset in which they occur*. A given word may exceed the 50%
> threshold in one dataset but not another.
>
> The 50% threshold has a significant implication when you first try
full-text
> searching to see how it works: If you create a table and insert only one
or
> two rows of text into it, every word in the text occurs in at least 50%
of
> the rows. As a result, no search returns any results. Be sure to insert
at
> least three rows, and preferably many more."
>
>
>
> On 5/25/05, Scott Purcell <[EMAIL PROTECTED]> wrote:
> >
> > Hello,
> > I am running 4.0.15 for Win95/98 and am working through the docs.
> >
> > I created a "text" type field with a 'fulltext' index. As I am
> > experimenting, I have run into a couple of questions:
> >
> > First off, I was having trouble getting results. So I added the word
> > "foobar" to one of the descriptions:
> > and that worked with this query:
> > select * from item where match(name, description) against('foobar')
> >
> >
> >
> > I have a word 'red' that appears 5-10 times, in a tmp table of 60
records.
> > If I run that query with 'red'
> > select * from item where match(name, description) against('red');
> > it returns empty set
> >
> > Upon reading, it looks like it is really trying to only get "unique"
names
> > from the index. But in my case the 'red' is a description that I would
like
> > to get back. Anyway to force this to return results?
> >
> > Any info would be helpful. I have read, but it gets a little confusing
> > first time through.
> >
> > Thanks,
> > Scott
> >
The other thing to remember is the "minimum word length". By default it is
set to 4. RED has only 3 characters so it would not have been indexed.
That would explain why FT searches for RED is not returning any records.
See here for FT tuning (settings):
http://dev.mysql.com/doc/mysql/en/fulltext-fine-tuning.html