Optimizing range search with two-table ORDER BY

Jesse Sheidlower Mon, 08 May 2006 06:15:44 -0700

Is there any way to optimize a range query that includes
an ORDER BY with keys from two different tables? I'm running
MySQL 4.1.18 on FreeBSD.


I've been struggling with some queries that are _incredibly_
slow--from 1-5 minutes on slowish but decent hardware. When I
try versions without the ORDER BY they're fast, and whatever
tweaks I do to the indexing do speed things up even faster,
but have no effect on the situation with the ORDER BY. The
docs suggest that indexes can't help here, but I find it hard
to believe that sorting on keys in different tables is that
rare a requirement; is there any way to restructure the
query to speed things up?

To take a few simple examples (most actual queries are more
complicated, but the slowdown isn't a result of the
complication), I have three tables (edited to remove fields
not used in these examples), "part" has_many "quotation"
has_many "cwGroup":

mysql> desc part;
+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| id        | int(10) unsigned |      | PRI | NULL    | auto_increment |
| cit       | text             | YES  |     | NULL    |                |
| d         | int(11)          | YES  | MUL | NULL    |                |
+-----------+------------------+------+-----+---------+----------------+

mysql> desc quotation;
+---------+------------------+------+-----+---------+----------------+
| Field   | Type             | Null | Key | Default | Extra          |
+---------+------------------+------+-----+---------+----------------+
| id      | int(10) unsigned |      | PRI | NULL    | auto_increment |
| part_id | int(10) unsigned |      | MUL | 0       |                |
| qt      | text             | YES  |     | NULL    |                |
+---------+------------------+------+-----+---------+----------------+

mysql> desc cwGroup;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| id           | int(10) unsigned |      | PRI | NULL    | auto_increment |
| quotation_id | int(10) unsigned |      | MUL | 0       |                |
| cw           | varchar(100)     | YES  |     | NULL    |                |
| stripped_cw  | varchar(100)     | YES  | MUL | NULL    |                |
+--------------+------------------+------+-----+---------+----------------+

The rough numbers are 100K rows in "part", 2.7M in "quotation", and
3.3M in "cwGroup".

For example, the following query, which would return 460 rows
without the LIMIT, takes about 51s:

SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup 
JOIN quotation ON (quotation.id = cwGroup.quotation_id ) 
JOIN part ON ( part.id = quotation.part_id ) 
WHERE ( part.d BETWEEN 1950 AND 1970 AND cwGroup.stripped_cw LIKE 'man%' ) 
ORDER BY part.d, cwGroup.stripped_cw 
LIMIT 25

and the EXPLAIN for it looks like:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cwGroup
         type: range
possible_keys: quotation_id,stripped_cw
          key: stripped_cw
      key_len: 101
          ref: NULL
         rows: 8489
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotation
         type: eq_ref
possible_keys: PRIMARY,part_id
          key: PRIMARY
      key_len: 4
          ref: rqs_incs.cwGroup.quotation_id
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: part
         type: eq_ref
possible_keys: PRIMARY,d
          key: PRIMARY
      key_len: 4
          ref: rqs_incs.quotation.part_id
         rows: 1
        Extra: Using where

Without the ORDER BY it drops to about 1.5s and EXPLAIN no longer
shows the use of "temporary" and "filesort".

An even worse example, but unfortunately a common need in this
app, is a query that returns a lot of rows (but which I'm paging
through, of course), such as:

SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup 
JOIN quotation ON (quotation.id = cwGroup.quotation_id ) 
JOIN part ON ( part.id = quotation.part_id ) 
WHERE ( cwGroup.stripped_cw BETWEEN 'ant' AND 'asx' ) 
ORDER BY cwGroup.stripped_cw, part.d
LIMIT 25

This takes 2m31s to execute, obviously due to the large number
of rows (the total result is about 47K rows), but a similar
query without the ORDER BY took only .08s (though a COUNT(*)
took a similar 2-3m):

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: cwGroup
         type: range
possible_keys: quotation_id,stripped_cw
          key: stripped_cw
      key_len: 101
          ref: NULL
         rows: 54745
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: quotation
         type: eq_ref
possible_keys: PRIMARY,part_id
          key: PRIMARY
      key_len: 4
          ref: rqs_incs.cwGroup.quotation_id
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: part
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: rqs_incs.quotation.part_id
         rows: 1
        Extra:

Other queries, as said, are more complicated, adding
additional columns in the searches or joining in other tables
(sometimes with range searches here as well), but these don't
seem to affect the underlying problem. Adding multiple-column
indexes also doesn't affect things in any significant way.

Any thoughts? I clearly need a significant speed improvement,
not just a tweak like making a bigger sort_buffer_size or
getting faster disks.

Thanks for reading this far.

Jesse Sheidlower

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Optimizing range search with two-table ORDER BY

Reply via email to