I did some tests with 20GB tables and several millions of rows a few months back, and what helped improve the performance the most was to separate the (small) columns used for searching from the (large) columns containing data. My test table was from a messaging system, and I redesigned it so that the larger table hold only a message-id and some TEXT-fields, and the smaller table held all usernames, timestamps, and other columns used for selecting and sorting.
Using the older table design, I tried to set the primary key as optimal as possible so that the most frequently run query could use it directly. However, since I used InnoDB, the primary key *is* the table, doing index searches is rather slow simply because of the amount of disk you have to traverse to get to the index. Adding secondary indexes and forcing the queries to use them did not help. When I changed the structure, the info table shrunk to a few hundred MB, and searches in that smaller table and index was considerably faster. Getting data from the data table was also very fast, since all access to it was reduced to primary key lookups instead of index scans. All of this combined made my queries go ten(!) times faster. I don't know if you can do the same, if you have large data-columns you can split off, but if you do, it won't hurt that much to try. :-) I also don't know how MyISAM compares to InnoDb in this specific case, maybe the result is smaller for MyISAM because of the difference in how the primary key is created and used. /Henrik -----Original Message----- From: Chad Attermann [mailto:[EMAIL PROTECTED] Sent: den 30 mars 2004 19:42 To: [EMAIL PROTECTED] Subject: Managing Very Large Tables Hello, I am trying to determine the best way to manage very large (MyISAM) tables, ensuring that they can be queried in reasonable amounts of time. One table in particular has over 18 million records (8GB data) and is growing by more than 150K records per day, and that rate is increasing. Besides the obvious things like better hardware and load-balancing across multiple replicating databases, I am trying to determine how to keep these data sets optimized for fastest queries. In my particular situation, the most recent data is queried most often, and data over 30-45 days old is not queried much at all but still must remain accessible. Each record has an integer time column that is indexed for querying over periods of time. Currently I run a script regularly that moves records older than 45 days from tables in the main database into identical tables in another (archive) database running in the same server process. This seems to speed up the tables in the main database, but I realize that deleting records leaves holes in the tables, and that this slows inserts as well as makes it impossible to read and write concurrently from these tables. My question is, is it better to keep all of the data in the original tables to avoid holes, or is 'archiving' records to another database a wise approach? How much does the size of a table really affect performance when querying the more recent data? If archiving is reasonable, is there a way to optimize the tables to get rid of the holes without siginificantly impacting ongoing activity on these tables? Thanks for your time! Chad Attermann [EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]