David Griffiths wrote:
Really? I had no idea. I am pretty sure that only Day will have a range. Our index is currently, (yearmonth, day, stem_base), so I need to drop and recreate it.

I think part of the slowness is the size of the table as well. We have data going back years in there. I am thinking about breaking the table up by yearmonth (ie all data for 200407 goes into a table by that name).

Most people obviously query for recent data, so most of the time just a single table would be in use. This would mean the indexes would be more efficient, not having to sort through 38 million rows that with out a doubt do not have data that the query requires. If someone requests data that spans a few months, a UNION would do the trick.

Thanks for the reply. BTW, where did you come across how MySQL uses indexes; this is pretty detailed info, and it would be great if it was documented somewhere.

David

How mysql uses indexes is documented in the manual. See

  <http://dev.mysql.com/doc/mysql/en/MySQL_indexes.html>
  <http://dev.mysql.com/doc/mysql/en/Multiple-column_indexes.html>
  <http://dev.mysql.com/doc/mysql/en/Query_Speed.html>

I first saw the book/chapter/page analogy, or something like it, in a long ago thread on this list.

Size may not matter that much so long as your queries can take full advantage of indexing. Unless you do a lot of deleting, rows with similar yearmonths will be more or less in the same place on disk. If most of your queries are for recent dates, I'd expect mysql to cache that part of the index and the OS to cache the disk where those rows live. With proper indexing and sufficient memory, then, you may find you get acceptable speed without breaking up the table.

There may be some other optimizations you could do, however. The manual recommends <http://dev.mysql.com/doc/mysql/en/Data_size.html> making your data as small as possible. According to your first message, you have:

day INT
yearmonth INT
stem_base VARCHAR(100)

If I understand correctly, day is in the range 1 to 31. If so, you could save 3 bytes per row by changing day from INT (4 bytes) to TINYINT (1 byte). Similarly, you could save 1 byte per row by changing yearmonth from INT to MEDIUMINT. 4 bytes per row * 38 million rows = about 150 Mb saved. Smaller rows make disk reads faster, and require less memory to process and cache. Also, smaller columns make for smaller indexes.

Finally, you should drop any unnecessary indexes. Whichever multi-column index you pick, you should drop the single column index on the column which comes first in the multi-column index. Unless you sometimes query by day without regard to other columns, you won't need an index on day. I suspect that an index on (stem_base, yearmonth, day) and one on (yearmonth, day) are the only ones you need. The first will work for your typical query, as well as for selects by just stem_base and selects by stem_base and yearmonth. The second will work for selects by yearmonth and selects by yearmonth and day, without regard to stem_base. If you never select just by date, you don't even need the second.

Michael


-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]



Reply via email to