Brian Mansell <[EMAIL PROTECTED]> wrote on 05/25/2005 03:09:03 PM: > Scott - > > Check this excerpt out ( > http://dev.mysql.com/doc/mysql/en/fulltext-search.html ) from the MySQL > Documentation. I hope it helps! > > --bemansell > > ... > > "Every correct word in the collection and in the query is weighted according > to its significance in the collection or query. This way, a word that is
> present in many documents has a lower weight (and may even have a zero > weight), because it has lower semantic value in this particular collection. > Conversely, if the word is rare, it receives a higher weight. The weights of > the words are then combined to compute the relevance of the row. > > Such a technique works best with large collections (in fact, it was > carefully tuned this way). For very small tables, word distribution does not > adequately reflect their semantic value, and this model may sometimes > produce bizarre results. For example, although the word ``MySQL'' is present > in every row of the articles table, a search for the word produces no > results: > > mysql> SELECT * FROM articles > -> WHERE MATCH (title,body) AGAINST ('MySQL'); > Empty set (0.00 sec) > > The search result is empty because the word ``MySQL'' is present in at > least 50% of the rows. As such, it is effectively treated as a stopword. For > large datasets, this is the most desirable behavior---a natural language > query should not return every second row from a 1GB table. For small > datasets, it may be less desirable. > > A word that matches half of rows in a table is less likely to locate > relevant documents. In fact, it most likely finds plenty of irrelevant > documents. We all know this happens far too often when we are trying to find > something on the Internet with a search engine. It is with this reasoning > that rows containing the word are assigned a low semantic value for *the > particular dataset in which they occur*. A given word may exceed the 50% > threshold in one dataset but not another. > > The 50% threshold has a significant implication when you first try full-text > searching to see how it works: If you create a table and insert only one or > two rows of text into it, every word in the text occurs in at least 50% of > the rows. As a result, no search returns any results. Be sure to insert at > least three rows, and preferably many more." > > > > On 5/25/05, Scott Purcell <[EMAIL PROTECTED]> wrote: > > > > Hello, > > I am running 4.0.15 for Win95/98 and am working through the docs. > > > > I created a "text" type field with a 'fulltext' index. As I am > > experimenting, I have run into a couple of questions: > > > > First off, I was having trouble getting results. So I added the word > > "foobar" to one of the descriptions: > > and that worked with this query: > > select * from item where match(name, description) against('foobar') > > > > > > > > I have a word 'red' that appears 5-10 times, in a tmp table of 60 records. > > If I run that query with 'red' > > select * from item where match(name, description) against('red'); > > it returns empty set > > > > Upon reading, it looks like it is really trying to only get "unique" names > > from the index. But in my case the 'red' is a description that I would like > > to get back. Anyway to force this to return results? > > > > Any info would be helpful. I have read, but it gets a little confusing > > first time through. > > > > Thanks, > > Scott > > The other thing to remember is the "minimum word length". By default it is set to 4. RED has only 3 characters so it would not have been indexed. That would explain why FT searches for RED is not returning any records. See here for FT tuning (settings): http://dev.mysql.com/doc/mysql/en/fulltext-fine-tuning.html