Thanks for the response Chris, although I can't seem to reproduce the
problem now, but I'm sure you're right.

There's something else strange that I've encountered while trying to
optimize this query.  I've got two machines, dev and production. After
adding the index to the title attribute on the dev machine, my query was
reduced from 2 minutes to virtually instantaneous (since the query is sorted
by title).  But when executing the query on the production machine, the
query doesn't use the indexed title attribute.  Here's what explain says:

DEV MACHINE (mysql 5.1.12-beta-log):

mysql> show index from phantom_products;
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table            | Non_unique | Key_name     | Seq_in_index | Column_name
| Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment
|
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| phantom_products |          0 | PRIMARY      |            1 | id
 | A         |      759168 |     NULL | NULL   |      | BTREE      |
|
| phantom_products |          1 | label_id_idx |            1 | label_id
 | A         |       18075 |     NULL | NULL   | YES  | BTREE      |
|
| phantom_products |          1 | title_idx    |            1 | title
| A         |      759168 |     NULL | NULL   | YES  | BTREE      |
|
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.07 sec)

mysql> explain SELECT phantom_products.id FROM phantom_products LEFT OUTER
JOIN phantom_labels ON phantom_labels.id = phantom_products.label_id where
phantom_labels.full_name like '%test%' order by title LIMIT 25, 25;
+----+-------------+------------------+--------+---------------+-----------+---------+--------------------------------------------+--------+-------------+
| id | select_type | table            | type   | possible_keys | key       |
key_len | ref                                        | rows   | Extra
|
+----+-------------+------------------+--------+---------------+-----------+---------+--------------------------------------------+--------+-------------+
|  1 | SIMPLE      | phantom_products | index  | label_id_idx  | title_idx |
258     | NULL                                       | 759168 |
|
|  1 | SIMPLE      | phantom_labels   | eq_ref | PRIMARY       | PRIMARY   |
4       | krad_development.phantom_products.label_id |      1 | Using where
|
+----+-------------+------------------+--------+---------------+-----------+---------+--------------------------------------------+--------+-------------+
2 rows in set (0.00 sec)



PRODUCTION MACHINE (mysql 5.1.22-rc-community)

mysql> show index from phantom_products;
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table            | Non_unique | Key_name     | Seq_in_index | Column_name
| Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment
|
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| phantom_products |          0 | PRIMARY      |            1 | id
 | A         |      781891 |     NULL | NULL   |      | BTREE      |
|
| phantom_products |          1 | label_id_idx |            1 | label_id
 | A         |       48868 |     NULL | NULL   | YES  | BTREE      |
|
| phantom_products |          1 | title_idx    |            1 | title
| A         |      781891 |     NULL | NULL   | YES  | BTREE      |
|
+------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.03 sec)

mysql> explain SELECT phantom_products.id FROM phantom_products LEFT OUTER
JOIN phantom_labels ON phantom_labels.id = phantom_products.label_id where
phantom_labels.full_name like '%test%' order by title LIMIT 25, 25;
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------------------+--------+----------------+
| id | select_type | table            | type   | possible_keys | key     |
key_len | ref                                       | rows   | Extra
 |
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------------------+--------+----------------+
|  1 | SIMPLE      | phantom_products | ALL    | label_id_idx  | NULL    |
NULL    | NULL                                      | 715038 | Using
filesort |
|  1 | SIMPLE      | phantom_labels   | eq_ref | PRIMARY       | PRIMARY | 4
      | krad_production.phantom_products.label_id |      1 | Using where
 |
+----+-------------+------------------+--------+---------------+---------+---------+-------------------------------------------+--------+----------------+
2 rows in set (0.00 sec)



Now, why is the production machine performing a filesort and including ALL
the entries?  It doesn't seem to be using the indexed title attribute at
all. Why would two different machines with the same indexes and schema
perform two different queries?  Could this be happening because I'm running
two slightly different versions of mysql?

And another concern:

I just added an index to the title attribute on the production server and
tried the same query again and got:

mysql> explain SELECT phantom_products.id FROM phantom_products LEFT OUTER
JOIN phantom_labels ON phantom_labels.id = phantom_products.label_id where
phantom_labels.full_name like '%test%' order by title LIMIT 25, 25;
+----+-------------+------------------+-------+---------------+---------------+---------+-----------------------------------+-------+-----------------------------------------------------------+
| id | select_type | table            | type  | possible_keys | key
  | key_len | ref                               | rows  | Extra
                                        |
+----+-------------+------------------+-------+---------------+---------------+---------+-----------------------------------+-------+-----------------------------------------------------------+
|  1 | SIMPLE      | phantom_labels   | index | PRIMARY       |
full_name_idx | 258     | NULL                              | 18713 | Using
where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | phantom_products | ref   | label_id_idx  | label_id_idx
 | 5       | krad_production.phantom_labels.id |    16 | Using where
                                      |
+----+-------------+------------------+-------+---------------+---------------+---------+-----------------------------------+-------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

now this is ideal.  It's first filtering on phantom_labels to retrieve only
those rows with the given full_name attribute, and then joining with the
phantom_products table, which executes extremely quickly.  However, when I
reboot mysql, the query goes back to the un-optimized one I encountered
previously on the production site, even though the index on the title
attribute exists.  Obviously this is very frustrating, since it makes it
extremely difficult to optimize my queries (especially since adding an index
on the production site takes 30 minutes for some reason - only takes about 2
minutes on the dev machine, the dev machine isn't _that_ much faster than
the production machine)

And my last question:


if I don't add the index on the title attribute, the following query takes 2
minutes:

query #1
SELECT phantom_products.id FROM phantom_products LEFT OUTER JOIN
phantom_labels ON phantom_labels.id = phantom_products.label_id where
phantom_labels.full_name like '%test%' order by title LIMIT 25, 25;

if I add the index, it reduces to a few seconds.  However, if I use a
subquery as follows:

query #2
SELECT phantom_products.id FROM phantom_products where
phantom_products.label_id in (select id from phantom_labels where
phantom_labels.full_name like '%test%') order by title LIMIT 25, 25;

now, shouldn't mysql be smart enough to do this type of thing in query #1?
 the reason query #2 is faster is because I'm narrowing down the number of
phantom_label records returned by filtering on the full_name first, and then
joining with the phantom_products table.  While in query #2, mysql seems to
be joining on all the rows, then filtering.. Am I right in assuming this?
 And if so, is there no way around this other than to write your own query
to avoid this situation?

Hopefully someone can give me some insight on this, since I'm getting a bit
lost.. Thanks!

Reply via email to