Help optimizing query

Jesse Sheidlower Tue, 23 Nov 2004 12:25:46 -0800

I have what I thought was a simple, well-indexed query, but it
turns out that it's acting as a pretty big drag. The one thing
that's clearly a problem (though I'm not sure of the extent of
the problem), I'm not sure how to fix.


There are three tables: citations, subjects, and a many-to-many
table linking these. They look like this (edited to remove
extraneous fields):

CREATE TABLE `citation` (
  `id` int(11) NOT NULL auto_increment,
  `word` varchar(50) NOT NULL default '',
  `last_modified` timestamp(14) NOT NULL,
  `deleted` datetime default NULL,
  PRIMARY KEY  (`id`),
  KEY `deleted` (`deleted`),
  KEY `word` (`word`)
)

CREATE TABLE `subject` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(50) NOT NULL default '',
  `last_modified` timestamp(14) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `name` (`name`)
) 
CREATE TABLE `citation_subject` (
  `id` int(11) NOT NULL auto_increment,
  `citation_id` int(11) NOT NULL default '0',
  `subject_id` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  KEY `citation_id` (`citation_id`,`subject_id`)
) 

A usual query is to get (some number of) the 
citations for a given subject, ordering by the
"word" which is stripped of spaces and hyphens.
I don't know of any other way to accomplish
this ordering. The EXPLAIN looks like this:

mysql> EXPLAIN SELECT citation.*, REPLACE(REPLACE(citation.word,' ',''), 
'-','') AS stripped_word
    -> FROM citation, subject, citation_subject
    -> WHERE subject.name = 'History'
    -> AND citation_subject.subject_id = subject.id
    -> AND citation_subject.citation_id = citation.id
    -> AND (citation.deleted IS NULL OR citation.deleted = 0)
    -> ORDER BY stripped_word\G
*************************** 1. row ***************************
        table: subject
         type: ref
possible_keys: PRIMARY,name
          key: name
      key_len: 50
          ref: const
         rows: 1
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
        table: citation_subject
         type: index
possible_keys: citation_id
          key: citation_id
      key_len: 8
          ref: NULL
         rows: 1247
        Extra: Using where; Using index
*************************** 3. row ***************************
        table: citation
         type: eq_ref
possible_keys: PRIMARY,deleted
          key: PRIMARY
      key_len: 4
          ref: citation_subject.citation_id
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

----

The number of records involved is relatively small (a few thousands; in
practice this query would also have a LIMIT statement and would be
preceded by a COUNT(*)), but it's dragging down the application it's
running in. (I have a considerably more complex query that involves
millions of records and twice as many joins, that is faster.) I'm
running this in Perl.

Any suggestions? I'd like to get rid of the whole "temporary" and
"filesort" things, but I'm not sure if that's what matters given
that there's only 1 row being returned there.

Thanks.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Help optimizing query

Reply via email to