Jeremy Zawodny writes:

> Perhaps you could tell us more about the queries that take 
> too long. Maybe there are some things you can do to reduce that time?

You don't want to go there.

But because you do... :-)

As far as I can tell, the queries are as optimized as they're going to
get.  I'm doing queries involving large numbers of joins, and all the
joins are const, eq_ref or ref.  I I also believe I have built the
proper indices.

Here are my tables:

CREATE TABLE `species` (
  `species_id` tinyint(3) unsigned NOT NULL default '0',
  `description` varchar(255) NOT NULL default '',
  `abbreviation` varchar(8) NOT NULL default '',
  PRIMARY KEY  (`species_id`),
  UNIQUE KEY `description` (`description`))

CREATE TABLE `sequence` (
  `sequence_id` int(10) unsigned NOT NULL auto_increment,
  `species_id` tinyint(3) unsigned NOT NULL default '0',
-- FK species.species_id
  `description` text,
  `sequence` text,
  PRIMARY KEY  (`sequence_id`),
  KEY `species_id` (`species_id`),
  FULLTEXT KEY `description` (`description`))

CREATE TABLE `result` (
  `result_id` int(10) unsigned NOT NULL auto_increment,
  `query_sequence_id` int(10) unsigned NOT NULL default '0',
-- FK sequence.sequence_id
  `db_sequence_id` int(10) unsigned NOT NULL default '0',
-- FK sequence.sequence_id
  `evalue` double unsigned default NULL,
  `query_len` smallint(5) unsigned default NULL,
  `align_len` smallint(5) unsigned default NULL,
  `query_sequence` text,
  `hit_sequence` text,
  `homology_sequence` text,
  `evalue_gt_1_e_neg_1` tinyint(1) NOT NULL default '0',
  `evalue_gt_1_e_neg_2` tinyint(1) NOT NULL default '0',
  `evalue_gt_1_e_neg_3` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`result_id`),
  UNIQUE KEY `query_db_seq` (`query_sequence_id`,`db_sequence_id`),
  KEY `query_sequence_id` (`query_sequence_id`),
  KEY `db_sequence_id` (`db_sequence_id`),
  KEY `evalue_idx` (`evalue`),
  KEY `evalue_gt_1_e_neg_1_id` (`evalue_gt_1_e_neg_1`),
  KEY `evalue_gt_1_e_neg_2_id` (`evalue_gt_1_e_neg_2`),
  KEY `evalue_gt_1_e_neg_3_id` (`evalue_gt_1_e_neg_3`))

Here is a sample query.  Note, the STRAIGHT_JOINS are there because
MySQL's query optimizer takes many orders of magnitude more time to
finish if I don't feed it such hints.

SELECT query_seq.sequence_id, query_seq.description, 
length(query_seq.sequence) 
FROM sequence query_seq, species query_species,  
species db_species_0 STRAIGHT_JOIN result result_0 STRAIGHT_JOIN
sequence db_seq_0 ,  species db_species_1 STRAIGHT_JOIN result result_1
STRAIGHT_JOIN sequence db_seq_1 ,  species db_species_2 STRAIGHT_JOIN
result result_2 STRAIGHT_JOIN sequence db_seq_2 ,  species db_species_3
STRAIGHT_JOIN result result_3 STRAIGHT_JOIN sequence db_seq_3 ,  species
db_species_4 STRAIGHT_JOIN result result_4 STRAIGHT_JOIN sequence
db_seq_4 ,  species db_species_5 STRAIGHT_JOIN result result_5
STRAIGHT_JOIN sequence db_seq_5 ,  species db_species_6 STRAIGHT_JOIN
result result_6 STRAIGHT_JOIN sequence db_seq_6 ,  species db_species_7
STRAIGHT_JOIN result result_7 STRAIGHT_JOIN sequence db_seq_7 
WHERE query_seq.species_id = query_species.species_id
AND query_species.description = "Arabidopsis thaliana" 
AND result_0.evalue IS NULL
AND result_0.query_sequence_id = query_seq.sequence_id
AND result_0.db_sequence_id = db_seq_0.sequence_id
AND db_seq_0.species_id = db_species_0.species_id
AND db_species_0.description = "Caenorhabditis elegans" 
AND result_1.evalue IS NULL
AND result_1.query_sequence_id = query_seq.sequence_id
AND result_1.db_sequence_id = db_seq_1.sequence_id
AND db_seq_1.species_id = db_species_1.species_id
AND db_species_1.description = "Dictyostelium discoideum" 
AND result_2.evalue IS NULL
AND result_2.query_sequence_id = query_seq.sequence_id
AND result_2.db_sequence_id = db_seq_2.sequence_id
AND db_seq_2.species_id = db_species_2.species_id
AND db_species_2.description = "Drosophila melanogaster" 
AND result_3.evalue IS NULL
AND result_3.query_sequence_id = query_seq.sequence_id
AND result_3.db_sequence_id = db_seq_3.sequence_id
AND db_seq_3.species_id = db_species_3.species_id
AND db_species_3.description = "Escherichia coli K12" 
AND result_4.evalue IS NULL
AND result_4.query_sequence_id = query_seq.sequence_id
AND result_4.db_sequence_id = db_seq_4.sequence_id
AND db_seq_4.species_id = db_species_4.species_id
AND db_species_4.description = "Homo sapiens" 
AND result_5.evalue IS NULL
AND result_5.query_sequence_id = query_seq.sequence_id
AND result_5.db_sequence_id = db_seq_5.sequence_id
AND db_seq_5.species_id = db_species_5.species_id
AND db_species_5.description = "Methanosarcina acetivorans" 
AND result_6.evalue IS NULL
AND result_6.query_sequence_id = query_seq.sequence_id
AND result_6.db_sequence_id = db_seq_6.sequence_id
AND db_seq_6.species_id = db_species_6.species_id
AND db_species_6.description = "Mus musculus" 
AND result_7.evalue IS NULL
AND result_7.query_sequence_id = query_seq.sequence_id
AND result_7.db_sequence_id = db_seq_7.sequence_id
AND db_seq_7.species_id = db_species_7.species_id
AND db_species_7.description = "Saccharomyces cerevisiae" 

Here is the EXPLAIN result (sorry it's going to be corrupted across the
email):

+---------------+--------+----------------------------------------------
------------+--------------+---------+----------------------------+-----
--+-------------+
| table         | type   | possible_keys
| key          | key_len | ref                        | rows  | Extra
|
+---------------+--------+----------------------------------------------
------------+--------------+---------+----------------------------+-----
--+-------------+
| query_species | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_0  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_1  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_2  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_3  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_4  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_5  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_6  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| db_species_7  | const  | PRIMARY,description
| description  |     255 | const                      |     1 |
|
| result_0      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | evalue_idx
|       9 | const                      | 66342 | Using where |
| query_seq     | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_0.query_sequence_id |     1 | Using
where |
| db_seq_0      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_0.db_sequence_id    |     1 | Using
where |
| result_1      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_1      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_1.db_sequence_id    |     1 | Using
where |
| result_2      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_2      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_2.db_sequence_id    |     1 | Using
where |
| result_3      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_3      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_3.db_sequence_id    |     1 | Using
where |
| result_4      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_4      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_4.db_sequence_id    |     1 | Using
where |
| result_5      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_5      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_5.db_sequence_id    |     1 | Using
where |
| result_6      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_6      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_6.db_sequence_id    |     1 | Using
where |
| result_7      | ref    |
query_db_seq,query_sequence_id,db_sequence_id,evalue_idx | query_db_seq
|       4 | query_seq.sequence_id      |     8 | Using where |
| db_seq_7      | eq_ref | PRIMARY,species_id
| PRIMARY      |       4 | result_7.db_sequence_id    |     1 | Using
where |
+---------------+--------+----------------------------------------------
------------+--------------+---------+----------------------------+-----
--+-------------+

Personally, I think a lot of the problem is that it takes 18 seconds
just to run the query optimizer pass on this (Apple OS X server, 2 x
1.3GHz G4).

> > RAM is cheap, so it seems to be a reasonable suggestion.
> 
> If RAM is cheap, why not add enough RAM to your server so 
> that your slow queries are less slow?

The entire database is in the buffer cache.  Key cache hit ratio is near
100% on repeat (though not identical) queries.

> Or are they CPU bound?

Yes.

--Michael


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to