Urms wrote:
I'm using pretty standard approach to sorting search results by relevancy:

SELECT DISTINCT product_name, MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0
ORDER BY rate DESC

It works fine as long as the quantity of results is not big. Once the
quantity is about 50,000 and more (I have a very big database) the query
starts working way too slow. Total number of records is about 4 million. It
takes about 2 sec when there are 50,000 records in the result but at the
same time it takes only about 0.006 sec without ORDER BY clause.
I understand that ORDER BY is time consuming but maybe someone knows a
different way to have sorting by relevancy.

Thanks in advance!

I think it's your SELECT DISTINCT that is slowing you down. For each new row being considered for inclusion to your result set, you are asking the engine to compare that row against all other rows you already have in the set. So what's happening is that you are doing a longer and longer linear search the larger your datasets become.

One option is to cache your results in a temporary table then de-duplicate your results from there.

Another option is to create a temporary table with a UNIQUE key on the columns you want to remain unique and use an INSERT IGNORE. Because of the UNIQUE key (or PRIMARY KEY if that's your choice) you will be doing an indexed search of all values rather than a linear search through the entire list. This would look something like:

CREATE TEMPORARY TABLE tmpFT_results (
  product_name <insert datatype here>
, rate <insert datatype here>
, primary key (product_name)
) ENGINE = MEMORY;

INSERT IGNORE tmpFT_results
SELECT product_name,
MATCH (keywords) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN MODE) AS
rate
FROM _TT
WHERE MATCH ( keywords ) AGAINST ('CONSOLIDATED* 16* bearing*' IN BOOLEAN
MODE ) >0;

SELECT product_name, rate
FROM tmpFT_results
ORDER BY rate DESC;

Sure it's three separate steps but it's tuned to the process you are trying to perform. The SELECT DISTINCT processing has no idea that you only need to keep the values of product_name distinct as we would hope the `rate` component may be duplicated.

If there is the possibility of different `rate` results for the same product_name value then you may also want to use the temporary table method to somehow weight (sum or average comes to mind) the match values across all responses before returning the results.

I hope these ideas help your performance and search accuracy.

Best wishes,
--
Shawn Green, Support Engineer
MySQL Inc., USA, www.mysql.com
Office: Blountville, TN
    __  ___     ___ ____  __
   /  |/  /_ __/ __/ __ \/ /
  / /|_/ / // /\ \/ /_/ / /__
 /_/  /_/\_, /___/\___\_\___/
        <___/
 Join the Quality Contribution Program Today!
 http://dev.mysql.com/qualitycontribution.html


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to