Re: Optimizing range search with two-table ORDER BY

2006-05-12 Thread sheeri kritzer

Hi Jesse,

Have you tried the following:

1)  ordering by only part.d and seeing how long the query takes
2)  putting an index on (part.d, cwGroup.stripped_cw) and seeing how
long the query takes.

1 will help pinpoint the problem, and 2 might actually help.

-Sheeri


SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup
JOIN quotation ON (quotation.id = cwGroup.quotation_id )
JOIN part ON ( part.id = quotation.part_id )
WHERE ( part.d BETWEEN 1950 AND 1970 AND cwGroup.stripped_cw LIKE 'man%' )
ORDER BY part.d, cwGroup.stripped_cw
LIMIT 25

and the EXPLAIN for it looks like:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: cwGroup
 type: range
possible_keys: quotation_id,stripped_cw
  key: stripped_cw
  key_len: 101
  ref: NULL
 rows: 8489
Extra: Using where; Using temporary; Using filesort
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: quotation
 type: eq_ref
possible_keys: PRIMARY,part_id
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.cwGroup.quotation_id
 rows: 1
Extra:
*** 3. row ***
   id: 1
  select_type: SIMPLE
table: part
 type: eq_ref
possible_keys: PRIMARY,d
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.quotation.part_id
 rows: 1
Extra: Using where

Without the ORDER BY it drops to about 1.5s and EXPLAIN no longer
shows the use of temporary and filesort.

An even worse example, but unfortunately a common need in this
app, is a query that returns a lot of rows (but which I'm paging
through, of course), such as:

SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup
JOIN quotation ON (quotation.id = cwGroup.quotation_id )
JOIN part ON ( part.id = quotation.part_id )
WHERE ( cwGroup.stripped_cw BETWEEN 'ant' AND 'asx' )
ORDER BY cwGroup.stripped_cw, part.d
LIMIT 25

This takes 2m31s to execute, obviously due to the large number
of rows (the total result is about 47K rows), but a similar
query without the ORDER BY took only .08s (though a COUNT(*)
took a similar 2-3m):

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: cwGroup
 type: range
possible_keys: quotation_id,stripped_cw
  key: stripped_cw
  key_len: 101
  ref: NULL
 rows: 54745
Extra: Using where; Using temporary; Using filesort
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: quotation
 type: eq_ref
possible_keys: PRIMARY,part_id
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.cwGroup.quotation_id
 rows: 1
Extra:
*** 3. row ***
   id: 1
  select_type: SIMPLE
table: part
 type: eq_ref
possible_keys: PRIMARY
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.quotation.part_id
 rows: 1
Extra:

Other queries, as said, are more complicated, adding
additional columns in the searches or joining in other tables
(sometimes with range searches here as well), but these don't
seem to affect the underlying problem. Adding multiple-column
indexes also doesn't affect things in any significant way.

Any thoughts? I clearly need a significant speed improvement,
not just a tweak like making a bigger sort_buffer_size or
getting faster disks.

Thanks for reading this far.

Jesse Sheidlower

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Optimizing range search with two-table ORDER BY

2006-05-08 Thread Jesse Sheidlower

Is there any way to optimize a range query that includes
an ORDER BY with keys from two different tables? I'm running
MySQL 4.1.18 on FreeBSD.

I've been struggling with some queries that are _incredibly_
slow--from 1-5 minutes on slowish but decent hardware. When I
try versions without the ORDER BY they're fast, and whatever
tweaks I do to the indexing do speed things up even faster,
but have no effect on the situation with the ORDER BY. The
docs suggest that indexes can't help here, but I find it hard
to believe that sorting on keys in different tables is that
rare a requirement; is there any way to restructure the
query to speed things up?

To take a few simple examples (most actual queries are more
complicated, but the slowdown isn't a result of the
complication), I have three tables (edited to remove fields
not used in these examples), part has_many quotation
has_many cwGroup:

mysql desc part;
+---+--+--+-+-++
| Field | Type | Null | Key | Default | Extra  |
+---+--+--+-+-++
| id| int(10) unsigned |  | PRI | NULL| auto_increment |
| cit   | text | YES  | | NULL||
| d | int(11)  | YES  | MUL | NULL||
+---+--+--+-+-++

mysql desc quotation;
+-+--+--+-+-++
| Field   | Type | Null | Key | Default | Extra  |
+-+--+--+-+-++
| id  | int(10) unsigned |  | PRI | NULL| auto_increment |
| part_id | int(10) unsigned |  | MUL | 0   ||
| qt  | text | YES  | | NULL||
+-+--+--+-+-++

mysql desc cwGroup;
+--+--+--+-+-++
| Field| Type | Null | Key | Default | Extra  |
+--+--+--+-+-++
| id   | int(10) unsigned |  | PRI | NULL| auto_increment |
| quotation_id | int(10) unsigned |  | MUL | 0   ||
| cw   | varchar(100) | YES  | | NULL||
| stripped_cw  | varchar(100) | YES  | MUL | NULL||
+--+--+--+-+-++

The rough numbers are 100K rows in part, 2.7M in quotation, and
3.3M in cwGroup.

For example, the following query, which would return 460 rows
without the LIMIT, takes about 51s:

SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup 
JOIN quotation ON (quotation.id = cwGroup.quotation_id ) 
JOIN part ON ( part.id = quotation.part_id ) 
WHERE ( part.d BETWEEN 1950 AND 1970 AND cwGroup.stripped_cw LIKE 'man%' ) 
ORDER BY part.d, cwGroup.stripped_cw 
LIMIT 25

and the EXPLAIN for it looks like:

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: cwGroup
 type: range
possible_keys: quotation_id,stripped_cw
  key: stripped_cw
  key_len: 101
  ref: NULL
 rows: 8489
Extra: Using where; Using temporary; Using filesort
*** 2. row ***
   id: 1
  select_type: SIMPLE
table: quotation
 type: eq_ref
possible_keys: PRIMARY,part_id
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.cwGroup.quotation_id
 rows: 1
Extra:
*** 3. row ***
   id: 1
  select_type: SIMPLE
table: part
 type: eq_ref
possible_keys: PRIMARY,d
  key: PRIMARY
  key_len: 4
  ref: rqs_incs.quotation.part_id
 rows: 1
Extra: Using where

Without the ORDER BY it drops to about 1.5s and EXPLAIN no longer
shows the use of temporary and filesort.

An even worse example, but unfortunately a common need in this
app, is a query that returns a lot of rows (but which I'm paging
through, of course), such as:

SELECT part.d, quotation.qt, cwGroup.cw
FROM cwGroup 
JOIN quotation ON (quotation.id = cwGroup.quotation_id ) 
JOIN part ON ( part.id = quotation.part_id ) 
WHERE ( cwGroup.stripped_cw BETWEEN 'ant' AND 'asx' ) 
ORDER BY cwGroup.stripped_cw, part.d
LIMIT 25

This takes 2m31s to execute, obviously due to the large number
of rows (the total result is about 47K rows), but a similar
query without the ORDER BY took only .08s (though a COUNT(*)
took a similar 2-3m):

*** 1. row ***
   id: 1
  select_type: SIMPLE
table: cwGroup
 type: range
possible_keys: quotation_id,stripped_cw
  key: stripped_cw
  key_len: 101
  ref: NULL