Managing Very Large Tables

2004-03-30 Thread Chad Attermann
Hello,

I am trying to determine the best way to manage very large (MyISAM) tables, ensuring 
that they can be queried in reasonable amounts of time.  One table in particular has 
over 18 million records (8GB data) and is growing by more than 150K records per day, 
and that rate is increasing.  Besides the obvious things like better hardware and 
load-balancing across multiple replicating databases, I am trying to determine how to 
keep these data sets optimized for fastest queries.  In my particular situation, the 
most recent data is queried most often, and data over 30-45 days old is not queried 
much at all but still must remain accessible.  Each record has an integer time column 
that is indexed for querying over periods of time.  Currently I run a script regularly 
that moves records older than 45 days from tables in the main database into identical 
tables in another (archive) database running in the same server process.  This seems 
to speed up the tables in the main database, but I realize that deleting records 
leaves holes in the tables, and that this slows inserts as well as makes it impossible 
to read and write concurrently from these tables.  My question is, is it better to 
keep all of the data in the original tables to avoid holes, or is 'archiving' records 
to another database a wise approach?  How much does the size of a table really affect 
performance when querying the more recent data?  If archiving is reasonable, is there 
a way to optimize the tables to get rid of the holes without siginificantly impacting 
ongoing activity on these tables?

Thanks for your time!

Chad Attermann
[EMAIL PROTECTED]


Re: Managing Very Large Tables

2004-03-30 Thread Victor Medina
hi!
Chad Attermann wrote:
Hello,

I am trying to determine the best way to manage very large (MyISAM) tables, ensuring that they can be queried in reasonable amounts of time.  
--8
Why insisting in using myIsam, and not use some table format that can 
assure you some degree of crash recovery and transacctional state like 
innodb or bdb? 150k inserts a day is a quiete important number, i don't 
think myisam is an optimal solution for such a data base structure.

Just my two cents my friends! =)

Best Regards!
--
 |...|
 |  _    _|Victor Medina M   |
 |\ \ \| |  _ \ / \   |Linux - Java - MySQL  |
 | \ \ \  _| | |_) / _ \  |Dpto. Sistemas - Ferreteria EPA   |
 | / / / |___|  __/ ___ \ |[EMAIL PROTECTED]  |
 |/_/_/|_|_| /_/   \_\|Tel: +58-241-8507325 - ext. 325   |
 ||Cel: +58-412-8859934  |
 ||geek by nature - linux by choice  |
 |...|
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


RE: Managing Very Large Tables

2004-03-30 Thread Henrik Schröder
I did some tests with 20GB tables and several millions of rows a few months
back, and what helped improve the performance the most was to separate the
(small) columns used for searching from the (large) columns containing data.
My test table was from a messaging system, and I redesigned it so that the
larger table hold only a message-id and some TEXT-fields, and the smaller
table held all usernames, timestamps, and other columns used for selecting
and sorting.

Using the older table design, I tried to set the primary key as optimal as
possible so that the most frequently run query could use it directly.
However, since I used InnoDB, the primary key *is* the table, doing index
searches is rather slow simply because of the amount of disk you have to
traverse to get to the index. Adding secondary indexes and forcing the
queries to use them did not help.

When I changed the structure, the info table shrunk to a few hundred MB, and
searches in that smaller table and index was considerably faster. Getting
data from the data table was also very fast, since all access to it was
reduced to primary key lookups instead of index scans. All of this combined
made my queries go ten(!) times faster.

I don't know if you can do the same, if you have large data-columns you can
split off, but if you do, it won't hurt that much to try. :-)

I also don't know how MyISAM compares to InnoDb in this specific case, maybe
the result is smaller for MyISAM because of the difference in how the
primary key is created and used.


/Henrik 

-Original Message-
From: Chad Attermann [mailto:[EMAIL PROTECTED] 
Sent: den 30 mars 2004 19:42
To: [EMAIL PROTECTED]
Subject: Managing Very Large Tables


Hello,

I am trying to determine the best way to manage very large (MyISAM) tables,
ensuring that they can be queried in reasonable amounts of time.  One table
in particular has over 18 million records (8GB data) and is growing by more
than 150K records per day, and that rate is increasing.  Besides the obvious
things like better hardware and load-balancing across multiple replicating
databases, I am trying to determine how to keep these data sets optimized
for fastest queries.  In my particular situation, the most recent data is
queried most often, and data over 30-45 days old is not queried much at all
but still must remain accessible.  Each record has an integer time column
that is indexed for querying over periods of time.  Currently I run a script
regularly that moves records older than 45 days from tables in the main
database into identical tables in another (archive) database running in the
same server process.  This seems to speed up the tables in the main
database, but I realize that deleting records leaves holes in the tables,
and that this slows inserts as well as makes it impossible to read and write
concurrently from these tables.  My question is, is it better to keep all of
the data in the original tables to avoid holes, or is 'archiving' records to
another database a wise approach?  How much does the size of a table really
affect performance when querying the more recent data?  If archiving is
reasonable, is there a way to optimize the tables to get rid of the holes
without siginificantly impacting ongoing activity on these tables?

Thanks for your time!

Chad Attermann
[EMAIL PROTECTED]

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Managing Very Large Tables

2004-03-30 Thread Dathan Vance Pattishall
Tips on managing very large tables for myISAM:

1) Ensure that the table type is not DYNAMIC but Fixed.
  = Issue the show table status command. 
  =  Look at Row Format
  = if Row Format != Dynamic the your ok else get rid of varchar type
columns
  = Reason:
Your myISAM table can grow only to 4GB then it will run out
of space even if your file system allows files to grow past 4GB.

2) For selects avoid ranges i.e. SELECT * FROM BLAH WHERE column  NOW() -
INTERVAL 30 DAY
 == or increase range_alloc field in my.cnf


3) For pruning as described below, in a maintenance window run optimize
table or 
myisamchk -r -S -a yourtable.MYI to get rid of deleted blocks. This will
help keep your query speed consistent and disk utilization lower.

4) Ensure that mysql_safe is off so you can get the benefits of simulatenous
reads or simulatenous writes. 

5) add --low-priority-update to allow writes to happen in batches after
reads have finished.



 -Original Message-
 From: Chad Attermann [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, March 30, 2004 9:42 AM
 To: [EMAIL PROTECTED]
 Subject: Managing Very Large Tables
 
 Hello,
 
 I am trying to determine the best way to manage very large (MyISAM)
 tables, ensuring that they can be queried in reasonable amounts of time.
 One table in particular has over 18 million records (8GB data) and is
 growing by more than 150K records per day, and that rate is increasing.
 Besides the obvious things like better hardware and load-balancing across
 multiple replicating databases, I am trying to determine how to keep these
 data sets optimized for fastest queries.  In my particular situation, the
 most recent data is queried most often, and data over 30-45 days old is
 not queried much at all but still must remain accessible.  Each record has
 an integer time column that is indexed for querying over periods of time.
 Currently I run a script regularly that moves records older than 45 days
 from tables in the main database into identical tables in another
 (archive) database running in the same server process.  This seems to
 speed up the tables in the main database, but I realize that deleting
 records leaves holes in the tables, and that this slows inserts as well as
 makes it impossible to read and write concurrently from these tables.  My
 question is, is it better to keep all of the data in the original tables
 to avoid holes, or is 'archiving' records to another database a wise
 approach?  How much does the size of a table really affect performance
 when querying the more recent data?  If archiving is reasonable, is there
 a way to optimize the tables to get rid of the holes without
 siginificantly impacting ongoing activity on these tables?
 
 Thanks for your time!
 
 Chad Attermann
 [EMAIL PROTECTED]



--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



RE: Managing Very Large Tables

2004-03-30 Thread Keith C. Ivey
On 30 Mar 2004 at 10:30, Dathan Vance Pattishall wrote:

 1) Ensure that the table type is not DYNAMIC but Fixed.
   = Issue the show table status command. 
   =  Look at Row Format
   = if Row Format != Dynamic the your ok else get rid of varchar type
 columns
   = Reason:
   Your myISAM table can grow only to 4GB then it will run out
 of space even if your file system allows files to grow past 4GB.

There are reasons for using fixed-length rather than dynamic records, 
but that's not one of them.  If you set MAX_ROWS high enough when 
creating your tables (or alter it later after they get big, though 
that could take a while), they'll be able to grow past 4 GB even if 
they're dynamic.

-- 
Keith C. Ivey [EMAIL PROTECTED]
Tobacco Documents Online
http://tobaccodocuments.org


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]



Re: Managing Very Large Tables

2004-03-30 Thread Jeremy Zawodny
On Tue, Mar 30, 2004 at 10:30:03AM -0800, Dathan Vance Pattishall wrote:
 Tips on managing very large tables for myISAM:
 
 1) Ensure that the table type is not DYNAMIC but Fixed.
   = Issue the show table status command. 
   =  Look at Row Format
   = if Row Format != Dynamic the your ok else get rid of varchar type
 columns
   = Reason:
   Your myISAM table can grow only to 4GB then it will run out
 of space even if your file system allows files to grow past 4GB.

That's a common mis-conception:

  http://jeremy.zawodny.com/blog/archives/000796.html

Jeremy
-- 
Jeremy D. Zawodny |  Perl, Web, MySQL, Linux Magazine, Yahoo!
[EMAIL PROTECTED]  |  http://jeremy.zawodny.com/

MySQL 4.0.15-Yahoo-SMP: up 198 days, processed 3,392,217,207 queries (197/sec. avg)

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]