I'd like to understand this better.

Lets say I have a table with 4 fields

itemKey INT(10) unsigned auto_increment primary key
status enum("forsale", "sold")
description TEXT
price DECIMAL(10,2)

And I do a

SELECT * FROM table
WHERE MATCH (description) AGAINST ("A really nice toy")
AND status="forsale"
ORDER BY MATCH (description) AGAINST ("A really nice toy")
LIMIT 10

And lets further state that there at 10,000,000 records in the table and 
7,500,000 of them are for sale. Lets also say that the minimum word 
length is set to 3 (so toy is considered significant for the fulltext 
search) and there is a fulltext index on description.

Lets also assume that there are 1,000,000 records with the word "really" 
, 1,000,000 records with the word "nice" and 500,000 records with the 
word "toy". Lets say there are only 500 records with all three words.

Approximately how many records will this query need to look at to return 
a result? Approximately how long will it take?

How would the following query compare:

SELECT * FROM table
WHERE MATCH (description) AGAINST ("+A +really +nice +toy")
AND status="forsale"
ORDER BY MATCH (description) AGAINST ("+A +really +nice +toy")
LIMIT 10

assuming that an end-user (not a programmer) is typing in the full text 
portion of the query. What can the application do, to protect against a 
long running query that will also lock the table for an extended period 
of time?


Sergei Golubchik wrote:

> Hi!
> 
> On Nov 26, Mike Wexler wrote:
> 
>>Sergei Golubchik wrote:
>>
>>>Hi!
>>>
>>>I didn't test it yet.
>>>
>>>The one thing I can say just now that unlike natural language search
>>>code (as exists in MySQL in 3.23) it need not to build the complete list
>>>of matched documents in memory in advance. It means that with LIMIT it
>>>should be MUCH faster than the old code.  Unfortunately, it also means
>>>that it does not "auto-magically" return documents with relevance
>>>decreased (without ORDER BY, that is).
>>>
>>If I use ORDER BY relevance and LIMIT. Does it need to build the 
>>complete list of documents in memory in advance? Or will it just build 
>>enough to satisfy the limit constraints.
>>
> 
> It will "build enough to satisfy the limit constraints", but
> with ORDER BY this "enough" is the complete list. If you want
> to get five rows with highest relevances MySQL will retrieve all the
> matching rows, sort them with relevance decreasing, and then return
> top five rows. But it should not be a big deal as for boolean searches
> there's no meaningful "relevance" anyway. It's boolean - true or false.
> Some floating point "relevance" is introduced to give a rough estimation
> of the number of words matched, but, in fact, all the rows in the result
> set are equally relevant to the query with relevance value TRUE.
> 
> Regards,
> Sergei
> 
> 



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to