Alec writes >...Generally, Log A + log B is bound to be larger than log (A*B)... Errm, log A + log B exactly = log(A*B) :-) . PB ----- [EMAIL PROTECTED] wrote: To reply to this, I think we have to understand why you have chosen to split the tables at all. It seems to me that this, by introducing a two-level lookup, is certain to be slower than any possible single table lookup. Generally, Log A + log B is bound to be larger than log (A*B).It appears that you are querying prediminantly by time. In this case, your index *must* start with the timestamp, not the monitor ID. I would suggest that you need an index on timestamp, and possible one on monitor ID - but not if, as you say, you never query by monitor ID at all. Do you need a PRIMARY KEY at all? In what way will your system break if there happen to be two entries with the same timestamp and monitor ID? Presumably this will reflect two events very close together: Wouldn't you rather store that fact rather than lose it? Generally, I would query your decision to have multiple tables by date to whatever. In my experience, whenever I have introduced such concepts into my early designs, they have disappeareed later into a better design. It looks to mee as if you are using a tool optimied to do fast searches on large databases, then crippling its ability to optimise. I would expect the use of thousands of tables effectively to disable MySQL's caching capability, which is one of the biggest performance boosters. Alec "John McCaskey" <[EMAIL PROTECTED]> 13/01/2006 17:20 To "MySQL" <mysql@lists.mysql.com> cc Subject Huge number of tables with InnoDB Hi everyone, I'm running MySQL 4.0.18 on Debian with a 2.6 linux kernel using ext3 as the underlying filesystem for the database storage. I currently have some InnoDB tables with the following structure: Log_20060101 { Monitor_id medium int, Timestamp timestamp, Avg float, PRIMARY KEY Monitor_id, Timestamp } We partition these tables by date as you can see as they grow very large and they get to be slow to insert and query to over time. We have the idea to change the partitioning so the tables are as follows: Log_[monitor_id] { Timestamp, Avg float, PRIMARY KEY Timestamp } This seems to have several key advantages: 1) Reduced disk space usage 2) Easier querying of data across time (but not across individual id's, it turns out that doesn't ever really happen in our data usage anyway though) 3) Smaller tables, resulting in faster reads/writes, also smaller data volume hopefully also resulting in faster reads/writes due to less disk io neccesary However, in our actual testing the 'faster writes' expectation is getting shot down. With 20,000 unique monitor id's and 8928 unique timestamps inserting to the old set of tables (20051201-20051231 in this case) is taking me about 4 hours 20 minutes. Inserting to the 20,000 new tables (Log_0-Log_20000) is taking about 10 hours. I expected this to be much faster as I hoped finding the right table for an insert would be a Hash type lookup taking linear time, while inserting into the large tree structure in the old tables which have a much higher volume of rows would be logarithmic time. Where did I go wrong? My only real thought so far is the disk subsystem of the OS being slow with large numbers of files, but I thought it wouldn't matter for InnoDB as the data storage is all one file. Does anyone know what would cause the inserts to be so much slower? John A. McCaskey Software Development Engineer Klir Technologies, Inc. [EMAIL PROTECTED] 206.902.2027 |
No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.371 / Virus Database: 267.14.17/228 - Release Date: 1/12/2006
-- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]