Re: Mysql growing pains, 4 days to create index on one table!
Split the myisam table into seperate tables. We will call each table a bucket. Create a MERGE table of all of them. For selecting the data. When inserting, use a hash function on your primary key values to determine which bucket to insert into. If you almost always select by primary key, then you can optimize your select queries to only look at the correct bucket as well, and to only use the merge table when you aren't selecting by primary key. This will speed your inserts because instead of 258M rows to search through for each insert there are only 8M if you use 32 buckets. The biggest benefit is that you could also insert in parallel using multiple mysql connections, because you could calculate the bucket as an additional column in your temporary table, then do the inserts for all the buckets the same time. I hit duplicate hashes for unique records, not sure why, I think I used aes_encrypt, how do you recomend creating a hash column via sql? I already split the data into separate tables, and use a merge, it really didnt speed things up that much -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
I load all the data into a table with no keys then I insert this data into a table with 225 million records, this large table has the primary key, this is what takes a LONG time Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Lachlan Mulcahy wrote: MySQL Version: 4.0.18 Server OS: windows 2000, or 2003 Memory 2 gig CPU(s) dual 2.6-3ghz xeon 500-2mb cache (cpu load is low) Disks (RAIDs and Independent disk speed/types) 8x72 gig 15,000 rpm scsi II u320 raid 5 dell perc setup -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Lopez David E-r9374c wrote: Since you have a temp table created (no keys I assume), use the command mysqldump -v -e -n -t dbname tablename filename.sql This creates a file that inserts the records back into the same table it also does not do an insert ignore I need the records to go into the historical table, with an insert ignore -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
mysql -i filename.sql matt ryan wrote: Lopez David E-r9374c wrote: Since you have a temp table created (no keys I assume), use the command mysqldump -v -e -n -t dbname tablename filename.sql This creates a file that inserts the records back into the same table it also does not do an insert ignore I need the records to go into the historical table, with an insert ignore -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
I don't see how using a multi value insert would be any faster than the insert between the tables. It would certainly be faster than one insert statement per row, but I don't think it would be faster than insert ... select ... The only reason I suggested an extended syntax insert earlier was because I wasn't aware that a temporary table was being loaded first. Do you ever delete from this table? Can you post the results from show variables for us? Have you removed the unecessary duplicate key on the first column of your primary key? Thanks, Justin --- gerald_clark [EMAIL PROTECTED] wrote: mysql -i filename.sql matt ryan wrote: Lopez David E-r9374c wrote: Since you have a temp table created (no keys I assume), use the command mysqldump -v -e -n -t dbname tablename filename.sql This creates a file that inserts the records back into the same table it also does not do an insert ignore I need the records to go into the historical table, with an insert ignore -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Do you ever delete from this table? Temp table is trunicated before the EBCDIC file is loaded Have you removed the unecessary duplicate key on the first column of your primary key? Have not touched the DIC index yet, I need a backup server to change indexes, it would take the main server down for too long, and it wont be that big an increase, it's only a 3 character index, I also do joines on that field to other tables, so I was hesitant on removing that index. Can you post the results from show variables for nope, the list wont let me send an email that big I did put it on the web though, here's the results from show variables http://www.geekopolis.com/Query_Result.txt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
--- matt ryan [EMAIL PROTECTED] wrote: Do you ever delete from this table? Temp table is trunicated before the EBCDIC file is loaded I meant the history table :) Have you removed the unecessary duplicate key on the first column of your primary key? Have not touched the DIC index yet, I need a backup server to change indexes, it would take the main server down for too long, and it wont be that big an increase, it's only a 3 character index, I also do joines on that field to other tables, so I was hesitant on removing that index. Even if it is only a 3 character index, you still need to do disk reads/writes to update the values in the index. With a huge table like yours one index can make a big difference. It will still work fine with joins. Since it is the leading column of another index, it will function just like a normal index. Having indexes on a and (a,b) is redundant for searching/joining only a. If you need to join/search on b, then a seperate index is required for b if you have indexed (a,b). The last option I can think of requires a lot more work on your part as far as inserting data, but it may be your only option at this point. This is why I asked if you delete data from the history table, as it makes deletes/updates more problematic as well. Split the myisam table into seperate tables. We will call each table a bucket. Create a MERGE table of all of them. For selecting the data. When inserting, use a hash function on your primary key values to determine which bucket to insert into. If you almost always select by primary key, then you can optimize your select queries to only look at the correct bucket as well, and to only use the merge table when you aren't selecting by primary key. This will speed your inserts because instead of 258M rows to search through for each insert there are only 8M if you use 32 buckets. The biggest benefit is that you could also insert in parallel using multiple mysql connections, because you could calculate the bucket as an additional column in your temporary table, then do the inserts for all the buckets the same time. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Donny Simonton wrote: Matt, I've been reading this thread for a while and at this point, I would say that you would need to provide the table structures and queries that you are running. For example, we have one table that has 8 billion rows in it and it close to 100 gigs and we can hammer it all day long without any problems. It really depends on how you are doing things. But as far as you mentioning about mysql not using multiple indexes, it does. You just have to create an index on multiple fields at one time. I've got tables with 10 fields in one index, now the trick with mysql is that you must use all top 10 fields in your where clause for mysql to really take advantage of the index. But I would definitely send the list your table structure with your indexes and some of your selects and inserts. You can always change the names of things if you don't want people to know the names of everything. Just my 2 cents. Original email has the table structure, query speed is not an issue (it is, but I will deal with that later) the issue is insert speed, I get 150k-1M records daily, of these, only 5-10 % are new records, I load the EBCDIC file into a temp table, and then do insert ignore into historytable select * from temp table and I cant create multiple field indexes, I would need 100 indexes on a table, the index side already exceeds the table size, I tried to add an index on date, but gave up because it ran for 2 days and was not done yet. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Here's another option to load without requiring a primary key (requires a LOT of extra disk space and fast CPU, and a batch window to run). Load the new daily data into the table without checking for dupes. Then create a new version of the table with distinct values. Something like this (assuming your table has 2 columns which are char (3) and char (5) for simplicity's sake) in pseudocode: (remember to pad all columns to the maximum width to make them uniform, right spaces on char, left zeroes on numerics) CREATE NEW_TABLE AS SELECT SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),1,3) AS COL1, SUBSTR (DISTINCT (RPAD(COL1,3) || RPAD(COL2,5)),4,5) AS COL2 FROM OLD_TABLE matt ryan [EMAIL PROTECTED] 07/16/2004 07:43 AM To: cc: [EMAIL PROTECTED] Subject:Re: Mysql growing pains, 4 days to create index on one table! Donny Simonton wrote: Matt, I've been reading this thread for a while and at this point, I would say that you would need to provide the table structures and queries that you are running. For example, we have one table that has 8 billion rows in it and it close to 100 gigs and we can hammer it all day long without any problems. It really depends on how you are doing things. But as far as you mentioning about mysql not using multiple indexes, it does. You just have to create an index on multiple fields at one time. I've got tables with 10 fields in one index, now the trick with mysql is that you must use all top 10 fields in your where clause for mysql to really take advantage of the index. But I would definitely send the list your table structure with your indexes and some of your selects and inserts. You can always change the names of things if you don't want people to know the names of everything. Just my 2 cents. Original email has the table structure, query speed is not an issue (it is, but I will deal with that later) the issue is insert speed, I get 150k-1M records daily, of these, only 5-10 % are new records, I load the EBCDIC file into a temp table, and then do insert ignore into historytable select * from temp table and I cant create multiple field indexes, I would need 100 indexes on a table, the index side already exceeds the table size, I tried to add an index on date, but gave up because it ran for 2 days and was not done yet. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Since you have a temp table created (no keys I assume), use the command mysqldump -v -e -n -t dbname tablename filename.sql This should create insert statements with many values in a single insert. Then use the client program to insert them to you db. mysql -u matt -p dbname filename.sql This is very fast way to insert rows. Speeds up insertion by at least 10x on my large tables (11 million rows). I noticed someone with 1000x more rows w/o problems. BTW, by forcing the table to have packed keys, the docs say it will slow your insertion. Maybe not that much, i don't know. David Isnt the text file it creates, going to insert the records back into the temp table when I load it back in? Does this do insert ignore or insert replace? I need to control that, on some tables I do insert ignore, on others i do insert replace. Almost all of the speed issue is read related, the disk writes are nearly 0, the reads are as fast as the drive can run, reading to see if the record violates the primary key I assume about 3 gig seems to be the magic number, less than that is lightning fast, more than that is extreemly slow -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Mysql growing pains, 4 days to create index on one table!
matt the issue is insert speed, I get 150k-1M records daily, of these, only 5-10 % are new records, I load the EBCDIC file into a temp table, and then do insert ignore into historytable select * from temp table Since you have a temp table created (no keys I assume), use the command mysqldump -v -e -n -t dbname tablename filename.sql This should create insert statements with many values in a single insert. Then use the client program to insert them to you db. mysql -u matt -p dbname filename.sql This is very fast way to insert rows. Speeds up insertion by at least 10x on my large tables (11 million rows). I noticed someone with 1000x more rows w/o problems. BTW, by forcing the table to have packed keys, the docs say it will slow your insertion. Maybe not that much, i don't know. David -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Justin Swanhart wrote: Indexes can generate vast amounts of random i/o. Because of rotational latency, random i/o can really slow you down, especially if you are using IDE or SATA disks because they can't do tagged queueing like SCSI disks can. If you have the budget for it, I would consider getting some solid state disks. Because they have extremely low latency you will be able to get full i/o bandwidth on your reads. If you can't afford those, consider adding more disks to your RAID array so that you can spread the reads over more spindles, which will help performance. Using 8x72 gig 15,000 rpm U320 scsi drives in raid 5 now, that should be a fast read raid config no more will fit in the server, and solid state are 70,000 $ it's out of our budget I optimize the tables every weekened any other sugestions? Would it help to defrag? The only way I can do it, is backup every file, wipe out the server, and then restore the files, there's not enough free space to do a proper defrag Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
I know it's naff but I've found it quicker to use myisamchk with row-sort than it is to get the MySQL daemon to regenerate keys (and if you know you're not changing the data file you can tell myisamchk not to copy the data), unless I've missed something in the MySQL config ... The only way I know to do this for PRIMARY KEYs is a process described by Kyle J. Munn (but watch out for free space :-): http://lists.mysql.com/mysql/158737 This boils down to moving the MYD file, truncating the table to create an empty MYD, modify your key(s), move the MYD back, and repair using myisamchk to re-create the keys. Otherwise you can use ALTER TABLE table DISABLE KEYS/ENABLE KEYS to disable keys while you make your inserts, then enable keys to update non-Primary keys in one go. I've never tried PACK_KEYS so can't suggest anything on that, other than it sounds like it adds more data to your index which is probably a bad thing with such short values. You may find that the 'dic' KEY isn't necessary, as it's the first part of your PRIMARY KEY. Depending on your data you may be better off storing integer IDs and the char values in a look-up table. I've found better performance for multi-column keys by putting the columns in order of least variance first, e.g. for a list of dates: 1979-04-23 1979-07-15 1980-02-04 1980-06-04 You want a key on (YEAR-MONTH-DAY) If you can you could put the index/data on different disks - not sure how you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?). You should definitely put the binary log file on another disk, but again not something I've used. I've found MySQL to be a royal pain working with multi-GB tables (my biggest is 12GB+13GB index). I've learnt that MySQL is a bit like a confused puppy - it doesn't know why it wet the floor, but it expects you to know that pained expression means you need to move it somewhere else ... All the best, Tim. - Original Message - From: matt ryan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, June 30, 2004 7:50 PM Subject: Mysql growing pains, 4 days to create index on one table! Rebuilding index takes 3 1/2 days!!! Growing pains with mysql.. I've got 2 dedicated servers, each with a slave, all run 32gig 15k rpm raid 5 on u320 perc raid cards, dell 2600/4600's with single channel backplanes (new ones will have dual channel) All have 2 gig of ram, but I've never seen mysql use more than 600mb of ram. The servers handle huge loads, each day there are 30 1-2 gig files loaded into large tables, total mysql data size is 96 gig, the large tables are 2-6 gig. Inserts are done on dupe key ignore, this takes hours on the large files, it barely keeps up with input files. At the bottom of this post I've got the mysql.ini config lines, any suggestions are welcome, I'm already beyond the mysql huge sample they used to include in the program. Sample table that I load is as follows. each day I get 40 % new records on the text file, the input file is normally 20mb, once a week I get one that's 1-2 gig, these take all day to load. I need more multiple column indexes, as some querys return millions of rows that must be scanned, but the index size already exceeds the table size, and the combinations I need would result in an myi that's 5x larger than the data itself. Here's an example of the speed problem, the index was corrupt so I dropped all and recreated, rather than a myisam repair. I think 3 days is a little excessive for a table that's only 3.428 gig, index is 2.729 gig. I cant remove the primary key, as it keeps duplicates out of the system, the input files are from old database's, we use mysql to store the data for the web frontend, mostly done in ASP, most queries take less than a second, unforuntatly we have big queries that take way more than the IIS timeout setting all the time, but no way around it, I cant add more indexes without making it even slower :( I cant tell if it's mysql that's the problem, or the hardware, Here's a screenshot of the disk IO, if I copy a file while mysql is doing the build index, the io shoots way up, which tells me, mysql is NOT maxing out the drives, and it's also not maxing out the memory. Unless it's doing lots and lots of seeks on the drive, which is harder to test using perfmon, are there any mysql test setups that would help identify where the bottleneck is? screenshot of disk io usage http://www.geekopolis.com/pics/diskio.jpg I'm all out of ideas, other than switching to another db, and the table indexes split across drives, maybe a 2 channel setup, 4 drives per channel, each 4 is a separate raid 5 setup, one holds data one holds indexes, cant do this with mysql though mysql alter table hood_stat add primary key (dic,niin,fr_ric,don,suf,dte_txn,sta) , add index `don` (`don`), add index `niin` (`niin`), add index `stor` (`stor`), add index `dic` (`dic`), add index `ctasc` (`ctasc`); Query OK, 45449534 rows
Re: Mysql growing pains, 4 days to create index on one table!
Tim Brody wrote: You may find that the 'dic' KEY isn't necessary, as it's the first part of your PRIMARY KEY. I've found better performance for multi-column keys by putting the columns in order of least variance first, e.g. for a list of dates: 1979-04-23 1979-07-15 1980-02-04 1980-06-04 You want a key on (YEAR-MONTH-DAY) If you can you could put the index/data on different disks - not sure how you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?). You should definitely put the binary log file on another disk, but again not something I've used. I've found MySQL to be a royal pain working with multi-GB tables (my biggest is 12GB+13GB index). I've learnt that MySQL is a bit like a confused puppy - it doesn't know why it wet the floor, but it expects you to know that pained expression means you need to move it somewhere else ... I need the DIC in the key to keep the record unique, I have thousands with everything identical except the DIC. I was confused on the multi key index issue, I thought it would seek faster if I put the most unique field up front, which I do on most tables, I did not on this one though. I have one large raid array now, so I cant split the data, or put the binary log on another disk. I found mysql was great up to about 3 gig, then everything hit the brakes and got really really really slow I'm scared of joines, every time I do a join in mysql on indexed fields in mysql, the performance is horrible, because the where clause is not a field that's in the join, performance is poopy I wish mysql could use multiple indexes like oracle, to narrow down the results, I've got some simple queries that take hours due to single index use, but every query field is indexed. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Consider replicating to some slave servers and dividing reads among them. I already replicate to slaves, and sites will do read only queries off these slaves 99.9 % of the tables are read only anyway, the only tables we update or insert into, are very very small and fast. These big tables are daily extracts from IBM DB2 sites, in ebcdic format, we archive the data and users then query our site which is faster, unless they start doing multiple query options, then things get slow. If you query only one feild its FAST, but if you query two feilds, its slow, very slow, need multiple key per query support in mysql. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
- Original Message - From: matt ryan [EMAIL PROTECTED] I need the DIC in the key to keep the record unique, I have thousands with everything identical except the DIC. In your schema you had DIC in the PRIMARY KEY and an additional (unnecessary?) KEY on DIC. I was confused on the multi key index issue, I thought it would seek faster if I put the most unique field up front, which I do on most tables, I did not on this one though. I have one large raid array now, so I cant split the data, or put the binary log on another disk. Having performed a couple of tests on a multi-key index I can't find any significant difference between different orders on the columns (logical I guess), although there may be certain uses that might work better in either situation ... Perhaps a better point to make would be to say change the order of your PRIMARY KEY to reflect the queries that you're performing. e.g. as I said above MySQL will efficiently use the first part of any key. Certainly, if you're performing a WHERE or ORDER BY on any of the columns in the primary key put those at the front. I found mysql was great up to about 3 gig, then everything hit the brakes and got really really really slow I'm scared of joines, every time I do a join in mysql on indexed fields in mysql, the performance is horrible, because the where clause is not a field that's in the join, performance is poopy Hmm, how about (sorry if I'm pointing out the obvious) e.g.: _left (col1 int, col2 int, col3 int, unique(col3,col1,col2)); _right (col1 int, col2 int, col3 int, unique(col1,col2,col3)); _left inner join _right using(col1,col2) where _left.col3=42 and _right.col3=35; The analyzer (explain) shows a const at the end of the ref columns for _right and const for _left. All the best, Tim. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
You might be out of luck with MySQL ... sorry. You may need to switch to a database that has a parallel query facility. Then - every query becomes a massive table scan but gets divided into multiple concurrent subqueries - and overall the job finishes in a reasonable amount of time. The epitomy of brute force. It's hard to rationalize initially but after a while you see it's the only way to go. Remember - indexes are no longer required. We have a billion row 100GB table the users search any and every way. Response time is less than a minute. We are anxiously waiting to see this technology added to MySQL. Maybe one day we'll have some money to contribute to the effort. Parallel query is not trivial. That's why these databases are expensive. I can send you more details privately if you are interested. I've used it, with oracle, but oracles index searches are better, hit the best one first, then 2nd best, then 3rd, but I really dont want to go to oracle, it's too complicated for my tech's vs mysql, hit the best one first, and use no other Query time is a non issue at this point, it's load time, load daily file into temp table, then insert ignore into main table, on key violation the violating record is ignored load time is the issue, the server loads files 15 hours a day, that big primary key makes loading any table over 2-3 gig VERY slow I thought it was a bug, everything was great untill you get up to about 3 or 4 gig, then it gets SLOW -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
That's the whole point. Eliminate your indexes and your load problems are solved. Especially given the fact that you insert ignore and don't use the primary key to validate uniqueness. matt ryan [EMAIL PROTECTED] 07/15/2004 11:38 AM To: cc: [EMAIL PROTECTED] Subject:Re: Mysql growing pains, 4 days to create index on one table! You might be out of luck with MySQL ... sorry. You may need to switch to a database that has a parallel query facility. Then - every query becomes a massive table scan but gets divided into multiple concurrent subqueries - and overall the job finishes in a reasonable amount of time. The epitomy of brute force. It's hard to rationalize initially but after a while you see it's the only way to go. Remember - indexes are no longer required. We have a billion row 100GB table the users search any and every way. Response time is less than a minute. We are anxiously waiting to see this technology added to MySQL. Maybe one day we'll have some money to contribute to the effort. Parallel query is not trivial. That's why these databases are expensive. I can send you more details privately if you are interested. I've used it, with oracle, but oracles index searches are better, hit the best one first, then 2nd best, then 3rd, but I really dont want to go to oracle, it's too complicated for my tech's vs mysql, hit the best one first, and use no other Query time is a non issue at this point, it's load time, load daily file into temp table, then insert ignore into main table, on key violation the violating record is ignored load time is the issue, the server loads files 15 hours a day, that big primary key makes loading any table over 2-3 gig VERY slow I thought it was a bug, everything was great untill you get up to about 3 or 4 gig, then it gets SLOW -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Reordering the primary key wouldn't necessarily speed up the key check. The reason for selecting a particular sequence within a primary key is to put the columns that are available the most often - upfront - so that the index will have at least something to bite on. Can you parittion further? You want to shoot for reading no more than a 1% slice for a given query. If possible. Can you partition by ranges of a column? Values 01 to 100 in partition 1, 11 to 200 in partition 2, etc? Anything that's specified consistently in every query? Then again - if your table is huge... and your queries are all over the place... and the user can specify any search criteria... and there's no predictable pattern you can partition on... and you frequently add and/or remove lots of data from your table... and your queries pull lots of rows... You might be out of luck with MySQL ... sorry. You may need to switch to a database that has a parallel query facility. Then - every query becomes a massive table scan but gets divided into multiple concurrent subqueries - and overall the job finishes in a reasonable amount of time. The epitomy of brute force. It's hard to rationalize initially but after a while you see it's the only way to go. Remember - indexes are no longer required. We have a billion row 100GB table the users search any and every way. Response time is less than a minute. We are anxiously waiting to see this technology added to MySQL. Maybe one day we'll have some money to contribute to the effort. Parallel query is not trivial. That's why these databases are expensive. I can send you more details privately if you are interested. matt ryan [EMAIL PROTECTED] 07/14/2004 12:27 PM To: [EMAIL PROTECTED] cc: Subject:Re: Mysql growing pains, 4 days to create index on one table! [EMAIL PROTECTED] wrote: You may want more indexes but you might be getting killed because you already have too many. To test - try loading into a table without indexes and see if it makes a difference. At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous. If write speed is a bottleneck you might consider Raid-1 instead of Raid-5. Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table. These tables are merged, the total table size is huge, on this particular table, it's , 45,449,534 rows, however, all the merge tables combined are 258,840,305 records perhaps I should reorder the pimary key, putting the longest most unique record up front, and the least unique at the end, would that speed up the key check? I can tell that almost everything is read IO, very little write IO -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
a few suggestions... Your slow inserts could be a concurrancy issue. If lots of users are doing selects that could be interfering with your inserts, especially if they use a n odbc/jdbc app that locks the table for the entire read. Jdbc reads do that when you tell it to stream the contents of the query, because they wouldn't all fit in memory. -- Does your import script do an INSERT for each line, or does it combine lines into multi-value inserts? doing an insert into ... values (...),(...),(...) will be much faster than doing one insert for each row. since your max packet size is pretty big, you should be able to consolidate a fairly large number of rows into one insert. -- What settings are you using on the Percs? What stripe size? What write cache are you using (back or thru)? Are you using read-ahead cacheing (that can hurt index performance)? -- Defragging the filessytem probably won't have any major impact on the speed of your application. --- Is there another way you can approach the duplicate problem? For instance, if duplicate data can only be generated in the last few days worth of imports, you might not have to rely on your primary key on the main table for importing. This only works if you can drop the primary key because it is only used for duplicate checking and isn't used to speed queries. You could instead create another table that you do all your importing to, taking care of dupes with a primary key on that table, then insert from that table into the main one. Keep a timestamp in that table and purge the older records periodically. the last thing i could think of would be a dupe checker table. Create an innodb table that consists of only the columns from your big table and make all the columns the primary key. Essentially you have just created an index only table. Insert your new data into a temporary heap table, then delete from the heap table where the key is in your dupe table. Then insert everything from the heap table into the big table. Once again, this only works if you don't need the primary key on the big table. This will use more CPU/memory but it may get around your read problems. Justin --- matt ryan [EMAIL PROTECTED] wrote: You might be out of luck with MySQL ... sorry. You may need to switch to a database that has a parallel query facility. Then - every query becomes a massive table scan but gets divided into multiple concurrent subqueries - and overall the job finishes in a reasonable amount of time. The epitomy of brute force. It's hard to rationalize initially but after a while you see it's the only way to go. Remember - indexes are no longer required. We have a billion row 100GB table the users search any and every way. Response time is less than a minute. We are anxiously waiting to see this technology added to MySQL. Maybe one day we'll have some money to contribute to the effort. Parallel query is not trivial. That's why these databases are expensive. I can send you more details privately if you are interested. I've used it, with oracle, but oracles index searches are better, hit the best one first, then 2nd best, then 3rd, but I really dont want to go to oracle, it's too complicated for my tech's vs mysql, hit the best one first, and use no other Query time is a non issue at this point, it's load time, load daily file into temp table, then insert ignore into main table, on key violation the violating record is ignored load time is the issue, the server loads files 15 hours a day, that big primary key makes loading any table over 2-3 gig VERY slow I thought it was a bug, everything was great untill you get up to about 3 or 4 gig, then it gets SLOW -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Insert ignore doesn't insert the record if there is a duplicate. It simply doesn't insert the row. Without the IGNORE clause, the query would generate an error insert of silenty ignoring the insert. --- [EMAIL PROTECTED] wrote: That's the whole point. Eliminate your indexes and your load problems are solved. Especially given the fact that you insert ignore and don't use the primary key to validate uniqueness. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Mysql growing pains, 4 days to create index on one table!
Matt, I've been reading this thread for a while and at this point, I would say that you would need to provide the table structures and queries that you are running. For example, we have one table that has 8 billion rows in it and it close to 100 gigs and we can hammer it all day long without any problems. It really depends on how you are doing things. But as far as you mentioning about mysql not using multiple indexes, it does. You just have to create an index on multiple fields at one time. I've got tables with 10 fields in one index, now the trick with mysql is that you must use all top 10 fields in your where clause for mysql to really take advantage of the index. But I would definitely send the list your table structure with your indexes and some of your selects and inserts. You can always change the names of things if you don't want people to know the names of everything. Just my 2 cents. Donny -Original Message- From: matt ryan [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 9:32 AM To: [EMAIL PROTECTED] Subject: Re: Mysql growing pains, 4 days to create index on one table! Tim Brody wrote: You may find that the 'dic' KEY isn't necessary, as it's the first part of your PRIMARY KEY. I've found better performance for multi-column keys by putting the columns in order of least variance first, e.g. for a list of dates: 1979-04-23 1979-07-15 1980-02-04 1980-06-04 You want a key on (YEAR-MONTH-DAY) If you can you could put the index/data on different disks - not sure how you would do that in Windows (INDEX DIRECTORY option for CREATE TABLE?). You should definitely put the binary log file on another disk, but again not something I've used. I've found MySQL to be a royal pain working with multi-GB tables (my biggest is 12GB+13GB index). I've learnt that MySQL is a bit like a confused puppy - it doesn't know why it wet the floor, but it expects you to know that pained expression means you need to move it somewhere else ... I need the DIC in the key to keep the record unique, I have thousands with everything identical except the DIC. I was confused on the multi key index issue, I thought it would seek faster if I put the most unique field up front, which I do on most tables, I did not on this one though. I have one large raid array now, so I cant split the data, or put the binary log on another disk. I found mysql was great up to about 3 gig, then everything hit the brakes and got really really really slow I'm scared of joines, every time I do a join in mysql on indexed fields in mysql, the performance is horrible, because the where clause is not a field that's in the join, performance is poopy I wish mysql could use multiple indexes like oracle, to narrow down the results, I've got some simple queries that take hours due to single index use, but every query field is indexed. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
I've used it, with oracle, but oracles index searches are better, hit the best one first, then 2nd best, then 3rd, but I really dont want to go to oracle, it's too complicated for my tech's Oracle rarely performs index merges, but it does have the ability to do, which mysql lacks. Query time is a non issue at this point, it's load time, load daily file into temp table, then insert ignore into main table, on key violation the violating record is ignored I know you don't want to go with Oracle, but I will just add that it could help you here too, because it can do parallel DML. This is especially useful if you have access to the partitioning option, because you could then partition your data by hash and get a number of bonuses. #1 your index updates will be much faster because there are a lot less rows to look through. #2 parallel DML can insert into multiple partitions at once. I am unsure if MaxDB supports any of those features, though it may. You may want to look into it to see if they are. I don't want to sound like I'm pushing Oracle. I'm not an Oracle sales rep, or anything like that. I am a professional Oracle DBA that happens to also use mySQL a lot and I like both databases. Sometimes one is better than the other for solving a problem. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Mysql growing pains, 4 days to create index on one table!
Matt, I think I might have missed the start of this thread, as I can't seem to find it. Could you please post the following info (I don't mind if you just mail it directly to me to save the list): MySQL Version: Server OS: Server Hardware configuration: - Memory - CPU(s) - Disks (RAIDs and Independent disk speed/types) Database Structure dump and some pointers on where you are having the issues. Lachlan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Lopez David E-r9374c wrote: matt 1) inserts using this format is much faster: INSERT INTO table (col1, col2) VALUES (val1,val2), (val3,val4) is much faster then single row insert. My experience is 2.5 hrs vs.. 36 hrs. 2) The PACK_KEYS=1 may be hurting you. I've never used it. 3) There may be a cache somewhere that's to small. You'll have to do some digging in this area. 4) dup key ignore - what does that mean exactly? 5) what is your OS rev, mysql rev. Please post any suggestions that you find valuable so we can all learn.. david If I understand it, pack_keys should help if your disk IO limited vs cpu limited, cpu is ususally near idle. I increased the cache sizes, helped a little, but not much delay_key_write=ALL key_buffer_size=1000M read_buffer_size=512M record_buffer=512M What would the syntax for that type of insert be? I have a table with 30,000 records, I need to insert them into the main table with millions of records, I thought that insert into table select * from 2nd table would be the fastest way. insert ignore will not insert a record, if it violates the primary key, I do this to keep duplicate records out of the system windows 2003 and 2000 servers, mysql-nt 4.0.16 I tried the newer versions, but found bugs on all of them, I submitted them to the bug system. I believe my bottleneck is reading the data to ensure the primary key is not violated, I see lots of read IO, but little write IO Matt -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Egor Egorov wrote: Are you running this under Microsoft Windows? Yes, windows 2k and 2003, mysql-nt 4.0.16 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
[EMAIL PROTECTED] wrote: You may want more indexes but you might be getting killed because you already have too many. To test - try loading into a table without indexes and see if it makes a difference. At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous. If write speed is a bottleneck you might consider Raid-1 instead of Raid-5. Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table. These tables are merged, the total table size is huge, on this particular table, it's , 45,449,534 rows, however, all the merge tables combined are 258,840,305 records perhaps I should reorder the pimary key, putting the longest most unique record up front, and the least unique at the end, would that speed up the key check? I can tell that almost everything is read IO, very little write IO -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Indexes can generate vast amounts of random i/o. Because of rotational latency, random i/o can really slow you down, especially if you are using IDE or SATA disks because they can't do tagged queueing like SCSI disks can. If you have the budget for it, I would consider getting some solid state disks. Because they have extremely low latency you will be able to get full i/o bandwidth on your reads. If you can't afford those, consider adding more disks to your RAID array so that you can spread the reads over more spindles, which will help performance. --- matt ryan [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: You may want more indexes but you might be getting killed because you already have too many. To test - try loading into a table without indexes and see if it makes a difference. At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous. If write speed is a bottleneck you might consider Raid-1 instead of Raid-5. Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table. These tables are merged, the total table size is huge, on this particular table, it's , 45,449,534 rows, however, all the merge tables combined are 258,840,305 records perhaps I should reorder the pimary key, putting the longest most unique record up front, and the least unique at the end, would that speed up the key check? I can tell that almost everything is read IO, very little write IO -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED] -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: Mysql growing pains, 4 days to create index on one table!
Are you running this under Microsoft Windows? -- For technical support contracts, goto https://order.mysql.com/?ref=ensita This email is sponsored by Ensita.net http://www.ensita.net/ __ ___ ___ __ / |/ /_ __/ __/ __ \/ /Egor Egorov / /|_/ / // /\ \/ /_/ / /__ [EMAIL PROTECTED] /_/ /_/\_, /___/\___\_\___/ MySQL AB / Ensita.net ___/ www.mysql.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
RE: Mysql growing pains, 4 days to create index on one table!
matt 1) inserts using this format is much faster: INSERT INTO table (col1, col2) VALUES (val1,val2), (val3,val4) is much faster then single row insert. My experience is 2.5 hrs vs.. 36 hrs. 2) The PACK_KEYS=1 may be hurting you. I've never used it. 3) There may be a cache somewhere that's to small. You'll have to do some digging in this area. 4) dup key ignore - what does that mean exactly? 5) what is your OS rev, mysql rev. Please post any suggestions that you find valuable so we can all learn.. david -Original Message- From: matt ryan [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 30, 2004 11:51 AM To: [EMAIL PROTECTED] Subject: Mysql growing pains, 4 days to create index on one table! Rebuilding index takes 3 1/2 days!!! Growing pains with mysql.. I've got 2 dedicated servers, each with a slave, all run 32gig 15k rpm raid 5 on u320 perc raid cards, dell 2600/4600's with single channel backplanes (new ones will have dual channel) All have 2 gig of ram, but I've never seen mysql use more than 600mb of ram. The servers handle huge loads, each day there are 30 1-2 gig files loaded into large tables, total mysql data size is 96 gig, the large tables are 2-6 gig. Inserts are done on dupe key ignore, this takes hours on the large files, it barely keeps up with input files. At the bottom of this post I've got the mysql.ini config lines, any suggestions are welcome, I'm already beyond the mysql huge sample they used to include in the program. Sample table that I load is as follows. each day I get 40 % new records on the text file, the input file is normally 20mb, once a week I get one that's 1-2 gig, these take all day to load. I need more multiple column indexes, as some querys return millions of rows that must be scanned, but the index size already exceeds the table size, and the combinations I need would result in an myi that's 5x larger than the data itself. Here's an example of the speed problem, the index was corrupt so I dropped all and recreated, rather than a myisam repair. I think 3 days is a little excessive for a table that's only 3.428 gig, index is 2.729 gig. I cant remove the primary key, as it keeps duplicates out of the system, the input files are from old database's, we use mysql to store the data for the web frontend, mostly done in ASP, most queries take less than a second, unforuntatly we have big queries that take way more than the IIS timeout setting all the time, but no way around it, I cant add more indexes without making it even slower :( I cant tell if it's mysql that's the problem, or the hardware, Here's a screenshot of the disk IO, if I copy a file while mysql is doing the build index, the io shoots way up, which tells me, mysql is NOT maxing out the drives, and it's also not maxing out the memory. Unless it's doing lots and lots of seeks on the drive, which is harder to test using perfmon, are there any mysql test setups that would help identify where the bottleneck is? screenshot of disk io usage http://www.geekopolis.com/pics/diskio.jpg I'm all out of ideas, other than switching to another db, and the table indexes split across drives, maybe a 2 channel setup, 4 drives per channel, each 4 is a separate raid 5 setup, one holds data one holds indexes, cant do this with mysql though mysql alter table hood_stat add primary key (dic,niin,fr_ric,don,suf,dte_txn,sta) , add index `don` (`don`), add index `niin` (`niin`), add index `stor` (`stor`), add index `dic` (`dic`), add index `ctasc` (`ctasc`); Query OK, 45449534 rows affected (3 days 19 hours 6 min 34.94 seconds Records: 45449534 Duplicates: 0 Warnings: 0 CREATE TABLE `hood_stat` ( `dic` char(3) NOT NULL default '', `fr_ric` char(3) NOT NULL default '', `niin` char(11) NOT NULL default '', `ui` char(2) NOT NULL default '', `qty` char(5) NOT NULL default '', `don` char(14) NOT NULL default '', `suf` char(1) NOT NULL default '', `dte_txn` char(5) NOT NULL default '', `ship_to` char(3) NOT NULL default '', `sta` char(2) NOT NULL default '', `lst_sos` char(3) NOT NULL default '', `esd` char(4) NOT NULL default '', `stor` char(3) NOT NULL default '', `d_t` char(4) NOT NULL default '', `ctasc` char(10) NOT NULL default '', PRIMARY KEY (`dic`,`niin`,`fr_ric`,`don`,`suf`,`dte_txn`,`sta` ), KEY `don` (`don`), KEY `niin` (`niin`), KEY `stor` (`stor`), KEY `dic` (`dic`), KEY `ctasc` (`ctasc`) ) TYPE=MyISAM MAX_ROWS=10 PACK_KEYS=1 skip-locking set-variable=delay_key_write=ALL set-variable= key_buffer_size=1500M set-variable=join_buffer=512M set-variable= max_allowed_packet=256M set-variable= table_cache=512 set-variable= sort_buffer=256M set-variable=tmp_table_size=400M set-variable= record_buffer=512M set-variable= thread_cache=8 set-variable=myisam_sort_buffer_size=256M myisam-recover=BACKUP,FORCE set-variable=read_buffer_size=512M
Re: Mysql growing pains, 4 days to create index on one table!
You may want more indexes but you might be getting killed because you already have too many. To test - try loading into a table without indexes and see if it makes a difference. At the very least - check to see if the primary index which starts with 'dic' can make your special 'dic' index superfluous. If write speed is a bottleneck you might consider Raid-1 instead of Raid-5. Reading lots of rows via index is a killer. Depending on your hardware it may be cheaper to table scan 50 rows than to read 1 via index. However, this requires partitioning of the data based on some column which appears in every query and acts as an initial filter. If you are lucky enough to be in that situation - consider a MERGE table. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]