Re: Fulltext Simple Question

SGreen Wed, 25 May 2005 12:27:57 -0700

Brian Mansell <[EMAIL PROTECTED]> wrote on 05/25/2005 03:09:03 PM:

> Scott -
> 
> Check this excerpt out ( 
> http://dev.mysql.com/doc/mysql/en/fulltext-search.html ) from the MySQL 
> Documentation. I hope it helps!
> 
> --bemansell
> 
> ...
> 
> "Every correct word in the collection and in the query is weighted 
according 
> to its significance in the collection or query. This way, a word that is


> present in many documents has a lower weight (and may even have a zero 
> weight), because it has lower semantic value in this particular 
collection. 
> Conversely, if the word is rare, it receives a higher weight. The 
weights of 
> the words are then combined to compute the relevance of the row. 
> 
> Such a technique works best with large collections (in fact, it was 
> carefully tuned this way). For very small tables, word distribution does 
not 
> adequately reflect their semantic value, and this model may sometimes 
> produce bizarre results. For example, although the word ``MySQL'' is 
present 
> in every row of the articles table, a search for the word produces no 
> results: 
> 
> mysql> SELECT * FROM articles
>     -> WHERE MATCH (title,body) AGAINST ('MySQL');
> Empty set (0.00 sec)
> 
>  The search result is empty because the word ``MySQL'' is present in at 
> least 50% of the rows. As such, it is effectively treated as a stopword. 
For 
> large datasets, this is the most desirable behavior---a natural language 

> query should not return every second row from a 1GB table. For small 
> datasets, it may be less desirable. 
> 
> A word that matches half of rows in a table is less likely to locate 
> relevant documents. In fact, it most likely finds plenty of irrelevant 
> documents. We all know this happens far too often when we are trying to 
find 
> something on the Internet with a search engine. It is with this 
reasoning 
> that rows containing the word are assigned a low semantic value for *the 

> particular dataset in which they occur*. A given word may exceed the 50% 

> threshold in one dataset but not another. 
> 
> The 50% threshold has a significant implication when you first try 
full-text 
> searching to see how it works: If you create a table and insert only one 
or 
> two rows of text into it, every word in the text occurs in at least 50% 
of 
> the rows. As a result, no search returns any results. Be sure to insert 
at 
> least three rows, and preferably many more."
> 
> 
> 
> On 5/25/05, Scott Purcell <[EMAIL PROTECTED]> wrote:
> > 
> > Hello,
> > I am running 4.0.15 for Win95/98 and am working through the docs.
> > 
> > I created a "text" type field with a 'fulltext' index. As I am 
> > experimenting, I have run into a couple of questions:
> > 
> > First off, I was having trouble getting results. So I added the word 
> > "foobar" to one of the descriptions:
> > and that worked with this query:
> > select * from item where match(name, description) against('foobar')
> > 
> > 
> > 
> > I have a word 'red' that appears 5-10 times, in a tmp table of 60 
records.
> > If I run that query with 'red'
> > select * from item where match(name, description) against('red');
> > it returns empty set
> > 
> > Upon reading, it looks like it is really trying to only get "unique" 
names 
> > from the index. But in my case the 'red' is a description that I would 
like 
> > to get back. Anyway to force this to return results?
> > 
> > Any info would be helpful. I have read, but it gets a little confusing 

> > first time through.
> > 
> > Thanks,
> > Scott
> > 

The other thing to remember is the "minimum word length". By default it is 
set to 4. RED has only 3 characters so it would not have been indexed. 
That would explain why FT searches for RED is not returning any records. 
See here for FT tuning (settings):  
http://dev.mysql.com/doc/mysql/en/fulltext-fine-tuning.html

Re: Fulltext Simple Question

Reply via email to