I think part of the slowness is the size of the table as well. We have data going back years in there. I am thinking about breaking the table up by yearmonth (ie all data for 200407 goes into a table by that name).
Most people obviously query for recent data, so most of the time just a single table would be in use. This would mean the indexes would be more efficient, not having to sort through 38 million rows that with out a doubt do not have data that the query requires. If someone requests data that spans a few months, a UNION would do the trick.
Thanks for the reply. BTW, where did you come across how MySQL uses indexes; this is pretty detailed info, and it would be great if it was documented somewhere.
David
Michael Stassen wrote:
Mysql uses multiple-column indexes from left to right. Multiple-column indexes are most effective when the column with a range criteria comes as far to the right as possible. Think of the index as sorting your data into a book where the first column is the chapter, the second column is the page, and the third column gives the lines on the page. In your sample query, you have an exact stem_base in mind, an exact yearmonth in mind, but a range of days. With an index on (stem_base, yearmonth, day), you would turn to the stem_base='xxxx' chapter, then the yearmonth=200407 page, then read the lines for day 07 to 27. Similarly, this would also work with an index on (yearmonth, stem_base, day). With an index on (day, yearmonth, stem_base), however, you have to look at each of the day chapters from 7 to 27, find the stem_base page in each of those chapters, then find the yearmonth line on each of those pages. That will work, but it's relatively complicated.
So, I would expect either an index on (stem_base, yearmonth, day) or an index on (yearmonth, stem_base, day) to be better than your current indexes starting with day. Which one should you choose? I expect both should work equally well for the sample query you gave, but since a 3-column index can be used as an index on the 1st column (as well as an index on the first 2 columns), the choice could matter for other queries. That is, an index on (stem_base, yearmonth, day) could be used to select rows for
SELECT * FROM traffic_boats WHERE stem_base = 'xxxx';
and an index on (yearmonth, stem_base, day) could be used to select rows for
SELECT * FROM traffic_boats WHERE yearmonth = 200407;
Michael
David Griffiths wrote:
We have a table with 40 million rows. It has statistics on traffic from our website. Logs are processed once a night, and the data from those logs are added.
Our table (traffic_boats, InnoDB) has three columns of interest:
day INT yearmonth INT stem_base VARCHAR(100)
There is an index on day, an index on yearmonth, an index on stem_base, an index on (day, yearmonth), an index on (day, yearmonth and stem_base). I added the last two today to try to fix the performance issues we are having.
A typical query would like like,
SELECT * FROM traffic_boats WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;
An explain-plan shows that the optimizer is picking the index on stem_base. It *should* be picking the composite index on (day, yearmonth and stembase). The greater-than-less-than is throwing it off. I can add a "USE INDEX" to force it to use the index I want it to, but that's a little hokey (and it gives me flashbacks to the days that I managed one of those commercial-RDBMS where tuning was a nightmare).
I've tried "analyze table" and "optimize table" (it's InnoDB) without luck.
What's really weird is that optimizer comes up with a bad count of rows to be examined.
If I let the optimizer pick the index,
mysql> explain SELECT * FROM traffic_boats WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+
| traffic_boats | ref | idx_yearmonth,idx_day,idx_stem,ymd_stem_idx,ymd_idx | idx_stem | 100 | const | 42600 | Using where |
+---------------+------+-----------------------------------------------------+----------+---------+-------+-------+-------------+
1 row in set (0.02 sec)
It thinks it needs to examine 42600 rows.
If I force the correct index,
mysql> explain SELECT * FROM traffic_boats USE INDEX (ymd_stem_idx) WHERE stem_base = 'xxxx' AND yearmonth = 200407 AND day >= 07 AND day <= 27;
+---------------+-------+---------------+--------------+---------+------+---------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+-------+---------------+--------------+---------+------+---------+-------------+
| traffic_boats | range | ymd_stem_idx | ymd_stem_idx | 108 | NULL | 4019400 | Using where |
+---------------+-------+---------------+--------------+---------+------+---------+-------------+
1 row in set (0.00 sec)
It thinks it needs to examine 4,019,400 rows.
If I ran this query without the USE INDEX it would take a few minutes. If I force the index, it takes 20 seconds. You would think that using stem_base, day and yearmonth would be much more selective than using just stem_base.
Anyone got some insight into this?
David
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]