Alec, Thanks for your response; however, I'm not convinced you are correct :) Let me try to explain in more detail my reasoning below.
Why should the two level lookup be Log A + Log B? Looking up a tablename is always an equilvalency check, not a range check. So there is no advantage to having a tree style structure for the lookup. The structure I would expect to be used is a hash table which would then make the two level lookup 1 + Log B, rather than Log(A*B) initially. As to index's, we ALWAYS include the monitor id in the query and query one monitor id at a time. We generally also include a time range. The reason I have structured the index monitor id first then timestamp on the old tables is that getting down to just one monitor id eliminates more rows then getting down to one timestamp. Second, you cannot have two separate index's and have both used in a single query when using InnoDB. MySQL will choose the best index and use it. Using this index as the primary key instead of a separate index improves disk space usage with InnoDB as the primary key doesn't require a separate index structure but will order the rows correctly in the actual storage. As to events 'close to one another' we have a set interval of 5 minutes for the timestamps, so that's not a concern, we do want the monitor_id, timestamp pairs to be unique. We insert many millions of these rows per day in our production systems, we also require historical storage for up to one year. Keeping that volume of rows in one table causes both inserts and selects to come to a crawl. Unless I'm missing something here not partitioning the data is not a remotely feasible option. Comments? John A. McCaskey Software Development Engineer Klir Technologies, Inc. [EMAIL PROTECTED] 206.902.2027 -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 13, 2006 9:41 AM To: John McCaskey Cc: MySQL Subject: Re: Huge number of tables with InnoDB To reply to this, I think we have to understand why you have chosen to split the tables at all. It seems to me that this, by introducing a two-level lookup, is certain to be slower than any possible single table lookup. Generally, Log A + log B is bound to be larger than log (A*B). It appears that you are querying prediminantly by time. In this case, your index *must* start with the timestamp, not the monitor ID. I would suggest that you need an index on timestamp, and possible one on monitor ID - but not if, as you say, you never query by monitor ID at all. Do you need a PRIMARY KEY at all? In what way will your system break if there happen to be two entries with the same timestamp and monitor ID? Presumably this will reflect two events very close together: Wouldn't you rather store that fact rather than lose it? Generally, I would query your decision to have multiple tables by date to whatever. In my experience, whenever I have introduced such concepts into my early designs, they have disappeareed later into a better design. It looks to mee as if you are using a tool optimied to do fast searches on large databases, then crippling its ability to optimise. I would expect the use of thousands of tables effectively to disable MySQL's caching capability, which is one of the biggest performance boosters. Alec "John McCaskey" <[EMAIL PROTECTED]> 13/01/2006 17:20 To "MySQL" <mysql@lists.mysql.com> cc Subject Huge number of tables with InnoDB Hi everyone, I'm running MySQL 4.0.18 on Debian with a 2.6 linux kernel using ext3 as the underlying filesystem for the database storage. I currently have some InnoDB tables with the following structure: Log_20060101 { Monitor_id medium int, Timestamp timestamp, Avg float, PRIMARY KEY Monitor_id, Timestamp } We partition these tables by date as you can see as they grow very large and they get to be slow to insert and query to over time. We have the idea to change the partitioning so the tables are as follows: Log_[monitor_id] { Timestamp, Avg float, PRIMARY KEY Timestamp } This seems to have several key advantages: 1) Reduced disk space usage 2) Easier querying of data across time (but not across individual id's, it turns out that doesn't ever really happen in our data usage anyway though) 3) Smaller tables, resulting in faster reads/writes, also smaller data volume hopefully also resulting in faster reads/writes due to less disk io neccesary However, in our actual testing the 'faster writes' expectation is getting shot down. With 20,000 unique monitor id's and 8928 unique timestamps inserting to the old set of tables (20051201-20051231 in this case) is taking me about 4 hours 20 minutes. Inserting to the 20,000 new tables (Log_0-Log_20000) is taking about 10 hours. I expected this to be much faster as I hoped finding the right table for an insert would be a Hash type lookup taking linear time, while inserting into the large tree structure in the old tables which have a much higher volume of rows would be logarithmic time. Where did I go wrong? My only real thought so far is the disk subsystem of the OS being slow with large numbers of files, but I thought it wouldn't matter for InnoDB as the data storage is all one file. Does anyone know what would cause the inserts to be so much slower? John A. McCaskey Software Development Engineer Klir Technologies, Inc. [EMAIL PROTECTED] 206.902.2027 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]