You are always getting "filesort, where and temporary" because the 
optimizer cannot use an index. Both your WHERE clause and your ORDER BY 
clause use computed values. None of those values exist in an index because 

you calculate them for every query.

What I did below is not a refactoring, just a different layout of your 
query. I did it 
this way to highlight the repetitiveness of parts of this query. (I hope 
that auto-wrapping doesn't destroy this too much):
SELECT distinct articles.* , 
(( 
        (title REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(body REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(synopsis REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(closing_date REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(location REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(cost REGEXP "[[:<:]]MArket[[:>:]]")*10)
        +(times REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(overnight_details REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(remuneration REGEXP "[[:<:]]MArket[[:>:]]")*10
        +(keywords REGEXP "[[:<:]]MArket[[:>:]]")*10
 ) 
+(
        (title REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(body REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(synopsis REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(closing_date REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(location REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(cost REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(times REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(overnight_details REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(remuneration REGEXP "[[:<:]]Segmentation[[:>:]]")*10
        +(keywords REGEXP "[[:<:]]Segmentation[[:>:]]")*10
)) AS score 
FROM articles 
LEFT JOIN article_related_communities 
        on articles.id = article_related_communities.article_id 
WHERE deleted=0 
        AND ((
                (title REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(body REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(synopsis REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(closing_date REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(location REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(cost REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(times REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(overnight_details REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(remuneration REGEXP "[[:<:]]MArket[[:>:]]")*10
                +(keywords REGEXP "[[:<:]]MArket[[:>:]]")*10 
        ) 
        +(
                (title REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(body REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(synopsis REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(closing_date REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(location REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(cost REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(times REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(overnight_details REGEXP 
"[[:<:]]Segmentation[[:>:]]")*10
                +(remuneration REGEXP "[[:<:]]Segmentation[[:>:]]")*10
                +(keywords REGEXP "[[:<:]]Segmentation[[:>:]]")*10 
        ))>0 
        AND expiry_date > '20041018'
        AND (status='2' 
                OR (status='1' 
                        AND (author = 161 
                                OR (articles.comm_id IN (''))
                        )
                )
        )
        AND release_date<='20041018'
        AND (articles.comm_id IN 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5') 


                OR article_related_communities.comm_id IN 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5')
        ) 
ORDER BY score DESC 
LIMIT 0,10;

It would be very nice if the optimizer detected the fact that part of your 

WHERE clause and your "score" column are the same formula, that way it 
would only need to compute that value once. I suspect that it does but I 
still suggest that you drop the calculation in your where clause and move 
your check to a HAVING clause. If you leave that calculation in the WHERE 
clause you will force a full table scan. You lose nothing with this 
change and you may gain in performance both because we might avoid a full 
table scan and just in case the optimizer didn't detect the duplication 
and factor out that calculation, you won't compute that value twice. It 
should also makes your code easier to maintain and your query simpler to 
parse. Here is what your new bottom half of your query would look like.

WHERE deleted=0 
        AND expiry_date > '20041018'
        AND (
                status='2' 
                OR (
                        status='1' 
                        AND (author = 161 
                                OR (articles.comm_id IN (''))
                        )
                )
        )
        AND release_date<='20041018'
        AND (
                articles.comm_id IN 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5') 


                OR article_related_communities.comm_id IN 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5')
        ) 
HAVING score >0
ORDER BY score DESC 
LIMIT 0,10;

For the purposes of this query, a new, compound index may help. I can't 
vouch that this index would speed  up other queries but it should help 
this one.

ALTER TABLE article ADD KEY(ID, status, deleted, expiry_date, 
release_date, comm_id)

I would experiment with both the number of columns in that index and the 
order you list those columns. All of those values play an important role 
in this query (probably other queries, too) and a well constructed index 
could seriously help your 
performance, especially since MySQL will only use one index per table per 
query.

Well, those are my suggestions. Let us (the list) know how things work 
out, please?

Yours,

Shawn Green
Database Administrator
Unimin Corporation - Spruce Pine

"Alexis Cheshire" <[EMAIL PROTECTED]> wrote on 11/08/2004 01:38:01 PM:

> Help: (and apologies if this is posted to the wrong list..)(pls let 
> me know where to post if so.. Thx ;-)
> 
> I have *inherited* an App that uses PHP / MySQL.  THe internal 
> search function within the application
> that I am supporting uses the following DB Table structure and runs 
> the Query below to return a set of results ordered by the REGEXP 
> match results  (The "score").
> The Query looks in each field, for a REGEXP match, if the REGEXP 
> returns true, the score is incremented by 10 * number of matches (I 
think)
> 
> I cannot use FULLTEXT indexes.  What I would like to do is refactor 
> the query to improve the speed.
> 
> Can someone please advise as to why my queries ALWAYS use filesort, 
> where and temporary (as found out by EXPLAIN...)
> 
> Can someone recommend a new query to deliver the same results??
> 
> Thanks in advance
> 
> ## START TABLE STRUCTURE ##
> CREATE TABLE article_related_communities (
>   comm_id int(6) default NULL,
>   article_id varchar(12) default NULL,
>   KEY article_id (article_id),
>   KEY comm_id (comm_id)
> ) TYPE=MyISAM;
> 
> CREATE TABLE articles (
>   id int(4) NOT NULL auto_increment,
>   c_type int(4) NOT NULL default '1',
>   author int(6) NOT NULL default '0',
>   comm_id int(6) default '0',
>   comm_type int(6) default '0',
>   keywords longtext NOT NULL,
>   title varchar(128) NOT NULL default '',
>   synopsis text NOT NULL,
>   release_date varchar(14) default NULL,
>   expiry_date varchar(14) default NULL,
>   start_date varchar(14) NOT NULL default '',
>   closing_date varchar(14) NOT NULL default '',
>   location varchar(255) NOT NULL default '',
>   cost varchar(255) NOT NULL default '',
>   times text NOT NULL,
>   overnight_details text NOT NULL,
>   remuneration varchar(255) NOT NULL default '',
>   body text NOT NULL,
>   status int(6) default NULL,
>   expired int(2) NOT NULL default '0',
>   deleted int(2) NOT NULL default '0',
>   ctime varchar(14) NOT NULL default '',
>   mtime varchar(14) NOT NULL default '',
>   mod_author int(6) default '0',
>   event_type int(2) default '0',
>   cs_org int(8) NOT NULL default '0',
>   expiry_reason varchar(128) NOT NULL default '',
>   accumulated_rating int(6) NOT NULL default '0',
>   num_ratings int(6) NOT NULL default '0',
>   average_rating float NOT NULL default '0',
>   event_duration varchar(128) NOT NULL default '0',
>   event_organiser varchar(128) NOT NULL default '0',
>   event_organiser_email varchar(128) NOT NULL default '0',
>   PRIMARY KEY  (id),
>   KEY c_type (c_type),
>   KEY status (status),
>   KEY comm_id (comm_id),
>   KEY author (author),
>   KEY expired (expired),
>   KEY deleted (deleted),
>   KEY expiry_date (expiry_date),
>   KEY release_date (release_date),
>   FULLTEXT KEY idx_tit_syn_body (title,synopsis,body)
> ) TYPE=MyISAM;
> 
> ## END TABLE STRUCTURE ##
> ## START QUERY ##
> 
> SELECT distinct articles.* , ( ( (title REGEXP "[[:<:]]
> MArket[[:>:]]")*10+(body REGEXP "[[:<:]]MArket[[:>:]]")*10+(synopsis
> REGEXP "[[:<:]]MArket[[:>:]]")*10+(closing_date REGEXP "[[:<:]]
> MArket[[:>:]]")*10+(location REGEXP "[[:<:]]MArket[[:>:]]")*10+(cost
> REGEXP "[[:<:]]MArket[[:>:]]")*10+(times REGEXP "[[:<:]]
> MArket[[:>:]]")*10+(overnight_details REGEXP "[[:<:]]MArket[[:>:]]")
> *10+(remuneration REGEXP "[[:<:]]MArket[[:>:]]")*10+(keywords REGEXP
> "[[:<:]]MArket[[:>:]]")*10 ) + ( (title REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(body REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(synopsis REGEXP "[[:<:]]Segmentation[[:>:]]")*10+(closing_date 
> REGEXP "[[:<:]]Segmentation[[:>:]]")*10+(location REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(cost REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(times REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(overnight_details REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(remuneration REGEXP "[[:<:]]Segmentation[[:>:]]")*10+(keywords 
> REGEXP "[[:<:]]Segmentation[[:>:]]")*10 ) ) AS score FROM articles 
> LEFT JOIN article_related_communities on articles.id = 
> article_related_communities.article_id WHERE deleted=0 AND  ( ( 
> (title REGEXP "[[:<:]]MArket[[:>:]]")*10+(body REGEXP "[[:<:]]
> MArket[[:>:]]")*10+(synopsis REGEXP "[[:<:]]MArket[[:>:]]")
> *10+(closing_date REGEXP "[[:<:]]MArket[[:>:]]")*10+(location REGEXP
> "[[:<:]]MArket[[:>:]]")*10+(cost REGEXP "[[:<:]]MArket[[:>:]]")
> *10+(times REGEXP "[[:<:]]MArket[[:>:]]")*10+(overnight_details 
> REGEXP "[[:<:]]MArket[[:>:]]")*10+(remuneration REGEXP "[[:<:]]
> MArket[[:>:]]")*10+(keywords REGEXP "[[:<:]]MArket[[:>:]]")*10 ) + (
> (title REGEXP "[[:<:]]Segmentation[[:>:]]")*10+(body REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(synopsis REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(closing_date REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(location REGEXP "[[:<:]]
> Segmentation[[:>:]]")*10+(cost REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(times REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(overnight_details REGEXP "[[:<:]]Segmentation[[:>:]]")
> *10+(remuneration REGEXP "[[:<:]]Segmentation[[:>:]]")*10+(keywords 
> REGEXP "[[:<:]]Segmentation[[:>:]]")*10 ) )>0 
>  AND expiry_date > '20041018'
>  AND (status='2' 
>  OR (status='1' AND (author = 161 OR (articles.comm_id IN ('')))))
>  AND release_date<='20041018'
>  AND (articles.comm_id IN 
> 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5')
> OR article_related_communities.comm_id IN 
> 
('559','560','9','286','425','484','103','324','323','18','1','1','2','4','5'))
> ORDER BY score DESC LIMIT 0,10;
> 
> ## END QUERY ##
> 
> 
> 
> -- 
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]
> 

Reply via email to