Hi all I have this query: SELECT n.ID, n.CatalogNumber, [...more...], d.ID, d.CatalogNumber, [...more...] FROM newdarwincoredata n INNER JOIN darwincoredata d ON n.CatalogNumber = d.CatalogNumber ORDER BY n.CatalogNumber; Both tables have exactly the same structure and indices: mysql> SHOW CREATE TABLE darwincoredata; CREATE TABLE `darwincoredata` ( `ID` int(10) NOT NULL auto_increment, `CatalogNumber` varchar(20) NOT NULL, [...more...], PRIMARY KEY (`ID`), UNIQUE KEY `CatalogNumber` (`CatalogNumber`), UNIQUE KEY `GlobalUniqueIdentifier` (`GlobalUniqueIdentifier`), KEY `DateLastModified` (`DateLastModified`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
There are 336983 rows in darwincoredata, and 337304 in newdarwincoredata. The plan for the query varies between using the CatalogNumber index and using a temporary table (which fails, once it fills up the disk). I get either this: mysql> EXPLAIN <that query> +----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+ | 1 | SIMPLE | d | ALL | CatalogNumber | NULL | NULL | NULL | 336238 | Using temporary; Using filesort | | 1 | SIMPLE | n | ref | CatalogNumber | CatalogNumber | 22 | GBIF_wrapper.d.CatalogNumber | 1 | | +----+-------------+-------+------+---------------+---------------+---------+------------------------------+--------+---------------------------------+ Or this: +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ | 1 | SIMPLE | d | index | CatalogNumber | CatalogNumber | 22 | NULL | 326508 | | | 1 | SIMPLE | n | ref | CatalogNumber | CatalogNumber | 22 | GBIF_wrapper.d.CatalogNumber | 1 | | +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ Executing SHOW INDEX IN newdarwincoredata; SHOW INDEX IN darwincoredata; And then doing the EXPLAIN again sometimes changes the plan. I assume this is because the cardinalities are recalculated. I expect the CatalogNumber unique keys to always be used for this query. Even if I do: SELECT [...] FROM newdarwincoredata n FORCE INDEX (CatalogNumber) INNER JOIN darwincoredata d FORCE INDEX (CatalogNumber) ON n.CatalogNumber = d.CatalogNumber ORDER BY n.CatalogNumber The query still sometimes uses a temporary table. EXPLAIN SELECT STRAIGHT_JOIN [...] FROM newdarwincoredata n INNER JOIN darwincoredata d ON n.CatalogNumber = d.CatalogNumber ORDER BY n.CatalogNumber +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ | 1 | SIMPLE | n | index | CatalogNumber | CatalogNumber | 22 | NULL | 336588 | | | 1 | SIMPLE | d | ref | CatalogNumber | CatalogNumber | 22 | GBIF_wrapper.n.CatalogNumber | 1 | | +----+-------------+-------+-------+---------------+---------------+---------+------------------------------+--------+-------+ This seems to always use the index, but I don't want to rely on this without knowing why -- might it be because n has more rows than d? MySQL version is 5.0.22-log, x86_64, redhat-linux-gnu. Thanks for any insights Matt Blissett