Re: two-column indexes and joins with ranges

Michael Stassen Fri, 01 Oct 2004 07:37:56 -0700

Well, you haven't posted the output of EXPLAIN, but I'll take a guess. I expect mysql sees your query as having a JOIN condition of accounts.account_id = transactions.account_id and two WHERE conditions:

  WHERE accounts.name = 'dave'
  AND when BETWEEN '2004-02-05' AND '2004-02-10'

The optimizer first considers the 2 WHERE conditions, looking for the one it believes will result in fewer rows. Presumably there is an index on accounts.name, but there is no usable index on `when`, as it doesn't come first in the multi-column index. Even if there were a usable index on `when`, I expect there would be fewer rows in accounts with the correct name than rows in transactions within the date range. So, the optimizer chooses accounts as the first table. For each row found in accounts with the right name, it matches that up with rows in transactions according to the JOIN condition, using the first part of the index. Finally, it applies the remaining WHERE condition on the results to filter the matching rows.

This may be a case where Shawn's usual advice is the way to go. That is, change your join to an explicit join rather than an implicit join, and move all the relevant conditions to the ON clause. In other words, see if

  SELECT *
  FROM accounts
  JOIN transactions ON accounts.account_id = transactions.account_id
                    AND when BETWEEN '2004-02-05' AND '2004-02-10'
  WHERE accounts.name = 'dave';

does any better.

Michael


William M. Shubert wrote:

I have a performance/index usage problem, and I am hoping somebody can
tell me why this problem exists, and whether there is a better solution
than the workaround that I'm using now.

The problem: I have a table with a two column index, such as

CREATE TABLE transactions (
  account_id INT NOT NULL,
  when DATETIME NOT NULL,
  INDEX(account_id, when)
);

I frequently need to do selections like:

SELECT * FROM accounts, transactions
  WHERE accounts.name = 'dave'
    AND accounts.account_id = transactions.account_id
    AND when BETWEEN '2004-02-05' AND '2004-02-10';

If "dave" has a huge number of transactions (and some of the accounts
do), this runs very slowly. EXPLAIN tells me that only the account_id
part of the "transactions" index is being used, with the "ref" index
usage type - apparently, every single one of dave's transactions is
being examined, and the "when BETWEEN ..." part of the join is only
being looked at after fetching the rows, even though the index itself
had all the data needed!

If I do this query in two parts, like this:

SELECT account_id FROM accounts WHERE name = 'dave';
SELECT * FROM transactions
  WHERE account_id = <dave's account ID>
    AND when BETWEEN '2004-02-05' AND '2004-02-10';

Suddenly the performance is great, and the full index is used, with a
"range" type of index usage. The MySQL documentation says that the
"range" type can only be used on constants - but I'm wondering, why does
it count this as non-constant? The dates I'm using are constant, so it
seems that it should be very easy for mysql to do it fast all in one
query. Only the first column of the index is variable.

I thought maybe the "range" index type didn't work when there could be
multiple hits on the first half of the index, but "SELECT * FROM
transactions WHERE account_id IN (3, 10, 50) AND when BETWEEN ..." also
uses the "range" index type and runs very fast!

So, can anybody mysql can't do "the right thing" with the
single-statement query? And is there a better option for me than to do
it in two queries? I'm using MySql 4.0.x, I was curious about whether
using subqueries in 4.1.x would help, but the point is moot right now
since my production systems won't be using 4.1.x until it leaves gamma
testing.
        Bill Shubert ([EMAIL PROTECTED])


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: two-column indexes and joins with ranges

Reply via email to