Indeed, as you say, Brent, correlated subqueries are not well-optimized
in MySQL. The specific subquery (the IN() subquery) demonstrated in the
original post is, however, optimized in MySQL 6.0 :)
More comments inline.
Brent Baisley wrote:
You are using a correlated subquery, which MySQL is terrible at.
Whenever you find yourself doing a correlated subquery, see if you can
switch it to a derived table with a join, which MySQL is far better at.
A derived table is like a "virtual" table you create on the fly. It's
very simple, just assign a name to your query and then treat it as if it
is a regular table.
Actually, in this case, no need for a derived table. A simple join will
suffice:
SELECT * FROM projects p
JOIN project_tags pt ON p.project_id = pt.project_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE tags.name='foo';
Make sure you've got indexes on p (project_id), pt (project_id, tag_id),
t (name)
Cheers,
Jay
So your query would look something like this:
SELECT projects.* FROM projects
JOIN (SELECT projects_tags.project_id FROM tags, projects_tags WHERE
tags.name='foo' AND projects_tags.project_id=projects.id) AS ptagids
ON project.id=ptagids.project_id
Your IN has become a JOIN and mysql optimizes it far better.
On Oct 19, 2007, at 6:57 PM, Ryan Bates wrote:
I'm trying to determine why a subquery is slower than running two
separate queries. I have a simple many-to-many association using 3
tables: projects, tags and projects_tags. Here's the query I'm using
to find the projects with a given tag:
SELECT * FROM projects WHERE id IN (SELECT projects_tags.project_id
FROM tags, projects_tags WHERE tags.name='foo' AND
projects_tags.project_id=projects.id);
(0.36 sec)
Compare that with splitting it into two queries:
SELECT projects_tags.project_id FROM tags, projects_tags WHERE
tags.name='foo' AND projects_tags.project_id=projects.id
(0.00 sec) /* returns 1, 2, 3 */
SELECT * FROM projects WHERE id IN (1, 2, 3);
(0.00 sec)
Why is it so much faster? Looking at the explain statement (below) of
the one with the subquery, it appears it's not using the primary key
index on the projects table. Why is it that MySQL doesn't perform this
simple optimization? And is there a solution that will allow me to
still use a subquery?
I realize I can use a join instead of a subquery, but this is a
simplified example.
Here's the explain statement:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: projects
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 15433
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: tags
type: ref
possible_keys: PRIMARY,index_tags_on_name
key: index_tags_on_name
key_len: 258
ref: const
rows: 1
Extra: Using where; Using index
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: projects_tags
type: ref
possible_keys: tag_id
key: tag_id
key_len: 5
ref: my_database.tags.id
rows: 10
Extra: Using where
Here's the table dumps:
CREATE TABLE `projects` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `tags` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
PRIMARY KEY (`id`),
KEY `index_tags_on_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `projects_tags` (
`project_id` int(11) default NULL,
`tag_id` int(11) default NULL,
KEY `tag_id` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I'm using MySQL 5.0.37. Thanks in advance.
Ryan
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]