Urms wrote:
I'm using pretty standard approach to sorting search results by relevancy:
SELECT DISTINCT product_name,
MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT
WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0
ORDER BY rate DESC
It works fine as long as the quantity of results is not big. Once the
quantity is about 50,000 and more (I have a very big database) the query
starts working way too slow. Total number of records is about 4 million. It
takes about 2 sec when there are 50,000 records in the result but at the
same time it takes only about 0.006 sec without ORDER BY clause.
I understand that ORDER BY is time consuming but maybe someone knows a
different way to have sorting by relevancy.
Thanks in advance!
I think it's your SELECT DISTINCT that is slowing you down. For each new
row being considered for inclusion to your result set, you are asking
the engine to compare that row against all other rows you already have
in the set. So what's happening is that you are doing a longer and
longer linear search the larger your datasets become.
One option is to cache your results in a temporary table then
de-duplicate your results from there.
Another option is to create a temporary table with a UNIQUE key on the
columns you want to remain unique and use an INSERT IGNORE. Because of
the UNIQUE key (or PRIMARY KEY if that's your choice) you will be doing
an indexed search of all values rather than a linear search through the
entire list. This would look something like:
CREATE TEMPORARY TABLE tmpFT_results (
product_name <insert datatype here>
, rate <insert datatype here>
, primary key (product_name)
) ENGINE = MEMORY;
INSERT IGNORE tmpFT_results
SELECT product_name,
MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT
WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0;
SELECT product_name, rate
FROM tmpFT_results
ORDER BY rate DESC;
Sure it's three separate steps but it's tuned to the process you are
trying to perform. The SELECT DISTINCT processing has no idea that you
only need to keep the values of product_name distinct as we would hope
the `rate` component may be duplicated.
If there is the possibility of different `rate` results for the same
product_name value then you may also want to use the temporary table
method to somehow weight (sum or average comes to mind) the match values
across all responses before returning the results.
I hope these ideas help your performance and search accuracy.
Best wishes,
--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ /
/ /|_/ / // /\ \/ /_/ / /__
/_/ /_/\_, /___/\___\_\___/
<___/
Join the Quality Contribution Program Today!
http://dev.mysql.com/qualitycontribution.html
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]