Re: Efficiently handling large amounts of data in mysql

mos Sat, 31 Jan 2004 08:44:15 -0800

Kyle,
        Sounds like you've thought of everything.

How much disk space does your largest table take up? Probably too much for what I'm about to suggest. There are commercial external ram disks that support gb volumes. You could create a temporary database on the RAM disk, file copy the table to the RAM disk, and run the import there. When you're done do a file copy back to your database directory. Keep in mind these hardware ram disks are very very expensive.

I assume you're using Char and not VarChar for your text fields. Have you tweaked the bulk_insert_buffer_size variable? This is similar to what you're doing: http://www.mysql.com/doc/en/Insert_speed.html And of course if you have a few bucks, there is paid support from MySQL AB. They've probably encountered this problem quite a few times.

Mike

At 11:22 PM 1/30/2004, you wrote:

mos wrote:
At 01:20 PM 1/30/2004, you wrote:
I routinely see people requesting information on how to efficiently handle *large* tables in mysql. Hopefully, the following will be helpful to those individuals.
I  have several  tables that  are 500M to 4B rows in size and
require  me  to  drop  and  re-create indices  at  regular  intervals.
Usually,  I'm removing  about 10%  of  the existing  records and  then
adding about 15% new records  back in.  Doing this with indices turned
on  would  take  days if  not  weeks.   With  the proper  sequence  of
operations, however, I can accomplish this in about 24 hours.
I'm assuming you're using MyISAM tables and not InnoDb. Have you tried using "Alter Table ... Disable Keys" to disable the non-unique indexes before you do your deletes and inserts?
Yes, we are using MyISAM. Innodb tables have been too slow in our tests. Disabling the keys does work, but the process I outlined turns out to be more efficient for very large data sets.
When possible,  we load data  using mysqlimport.  This is  much faster
than  connecting  to the  server  and  executing  a series  of  INSERT
statements.
You can get similar speeds using "Load Data"
mysqlimport translates directly to "LOAD DATA". Scripting with mysqlimport is much simpler than trying to write a series of sql statements and pipe them into a client.

I have to ask, why are you deleting 10% of your rows and adding another 15% back in? Are you removing old data, data that is say 6 months old, and then importing this months data? If so, can't you just store one months data in separate tables and use Merge Table for your queries? At the end of each month, just drop the oldest table and create a new table for the current month's data.

Wrong kind of data storage architecture for my problem...

I've actually done some testing of merge tables at their upper limits. The N^2 table actually contains more than 2^32 rows so we were forced to split it across 2 tables. (FYI: Until last spring a bug existed in merge tables where the row count was returned as an int instead of a long.)

Kyle
***************************************************************************
Kyle J. Munn                            Email: [EMAIL PROTECTED]
EraGen Biosciences                      Phone: 608.662.9000 x351
http://www.EraGen.com           Fax:   608.662.9003


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Efficiently handling large amounts of data in mysql

Reply via email to