Mysql uses multiple-column indexes from left to right. Multiple-column indexes are most effective when the column with a range criteria comes as far to the right as possible. Think of the index as sorting your data into a book where the first column is the chapter, the second column is the page, and the third column gives the lines on the page. In your sample query, you have an exact stem_base in mind, an exact yearmonth in mind, but a range of days. With an index on (stem_base, yearmonth, day), you would turn to the stem_base='xxxx' chapter, then the yearmonth=200407 page, then read the lines for day 07 to 27. Similarly, this would also work with an index on (yearmonth, stem_base, day). With an index on (day, yearmonth, stem_base), however, you have to look at each of the day chapters from 7 to 27, find the stem_base page in each of those chapters, then find the yearmonth line on each of those pages. That will work, but it's relatively complicated.

So, I would expect either an index on (stem_base, yearmonth, day) or an index on (yearmonth, stem_base, day) to be better than your current indexes starting with day. Which one should you choose? I expect both should work equally well for the sample query you gave, but since a 3-column index can be used as an index on the 1st column (as well as an index on the first 2 columns), the choice could matter for other queries. That is, an index on (stem_base, yearmonth, day) could be used to select rows for

  SELECT * FROM traffic_boats WHERE stem_base = 'xxxx';

and an index on (yearmonth, stem_base, day) could be used to select rows for

  SELECT * FROM traffic_boats WHERE yearmonth = 200407;

Michael


David Griffiths wrote:

We have a table with 40 million rows. It has statistics on traffic from our website. Logs are processed once a night, and the data from those logs are added.

Our table (traffic_boats, InnoDB) has three columns of interest:

day INT
yearmonth INT
stem_base VARCHAR(100)

There is an index on day, an index on yearmonth, an index on stem_base, an index on (day, yearmonth), an index on (day, yearmonth and stem_base). I added the last two today to try to fix the performance issues we are having.

A typical query would like like,

SELECT * FROM traffic_boats WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;

An explain-plan shows that the optimizer is picking the index on stem_base. It *should* be picking the composite index on (day, yearmonth and stembase). The greater-than-less-than is throwing it off. I can add a "USE INDEX" to force it to use the index I want it to, but that's a little hokey (and it gives me flashbacks to the days that I managed one of those commercial-RDBMS where tuning was a nightmare).

I've tried "analyze table" and "optimize table" (it's InnoDB) without luck.

What's really weird is that optimizer comes up with a bad count of rows to be examined.

If I let the optimizer pick the index,

mysql> explain SELECT * FROM traffic_boats WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+


| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+


| traffic_boats | ref | idx_yearmonth,idx_day,idx_stem,ymd_stem_idx,ymd_idx | idx_stem | 100 | const | 42600 | Using where |
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+


1 row in set (0.02 sec)

It thinks it needs to examine 42600 rows.

If I force the correct index,

mysql> explain SELECT * FROM traffic_boats USE INDEX (ymd_stem_idx) WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;
+---------------+-------+---------------+--------------+---------+------+---------+-------------+


| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+-------+---------------+--------------+---------+------+---------+-------------+


| traffic_boats | range | ymd_stem_idx | ymd_stem_idx | 108 | NULL | 4019400 | Using where |
+---------------+-------+---------------+--------------+---------+------+---------+-------------+


1 row in set (0.00 sec)

It thinks it needs to examine 4,019,400 rows.

If I ran this query without the USE INDEX it would take a few minutes. If I force the index, it takes 20 seconds. You would think that using stem_base, day and yearmonth would be much more selective than using just stem_base.

Anyone got some insight into this?

David




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to