WHERE accounts.name = 'dave' AND when BETWEEN '2004-02-05' AND '2004-02-10'
The optimizer first considers the 2 WHERE conditions, looking for the one it believes will result in fewer rows. Presumably there is an index on accounts.name, but there is no usable index on `when`, as it doesn't come first in the multi-column index. Even if there were a usable index on `when`, I expect there would be fewer rows in accounts with the correct name than rows in transactions within the date range. So, the optimizer chooses accounts as the first table. For each row found in accounts with the right name, it matches that up with rows in transactions according to the JOIN condition, using the first part of the index. Finally, it applies the remaining WHERE condition on the results to filter the matching rows.
This may be a case where Shawn's usual advice is the way to go.
That is, change your join to an explicit join rather than an implicit join, and move all the relevant conditions to the ON clause. In other words, see if
SELECT * FROM accounts JOIN transactions ON accounts.account_id = transactions.account_id AND when BETWEEN '2004-02-05' AND '2004-02-10' WHERE accounts.name = 'dave';
does any better.
Michael
William M. Shubert wrote:
I have a performance/index usage problem, and I am hoping somebody can tell me why this problem exists, and whether there is a better solution than the workaround that I'm using now.
The problem: I have a table with a two column index, such as
CREATE TABLE transactions ( account_id INT NOT NULL, when DATETIME NOT NULL, INDEX(account_id, when) );
I frequently need to do selections like:
SELECT * FROM accounts, transactions WHERE accounts.name = 'dave' AND accounts.account_id = transactions.account_id AND when BETWEEN '2004-02-05' AND '2004-02-10';
If "dave" has a huge number of transactions (and some of the accounts do), this runs very slowly. EXPLAIN tells me that only the account_id part of the "transactions" index is being used, with the "ref" index usage type - apparently, every single one of dave's transactions is being examined, and the "when BETWEEN ..." part of the join is only being looked at after fetching the rows, even though the index itself had all the data needed!
If I do this query in two parts, like this:
SELECT account_id FROM accounts WHERE name = 'dave'; SELECT * FROM transactions WHERE account_id = <dave's account ID> AND when BETWEEN '2004-02-05' AND '2004-02-10';
Suddenly the performance is great, and the full index is used, with a "range" type of index usage. The MySQL documentation says that the "range" type can only be used on constants - but I'm wondering, why does it count this as non-constant? The dates I'm using are constant, so it seems that it should be very easy for mysql to do it fast all in one query. Only the first column of the index is variable.
I thought maybe the "range" index type didn't work when there could be multiple hits on the first half of the index, but "SELECT * FROM transactions WHERE account_id IN (3, 10, 50) AND when BETWEEN ..." also uses the "range" index type and runs very fast!
So, can anybody mysql can't do "the right thing" with the single-statement query? And is there a better option for me than to do it in two queries? I'm using MySql 4.0.x, I was curious about whether using subqueries in 4.1.x would help, but the point is moot right now since my production systems won't be using 4.1.x until it leaves gamma testing. Bill Shubert ([EMAIL PROTECTED])
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]