Re: very large HEAP-tables in 4.1.3
harrison, thanks for you mail, I think mysql uses way too much memory (overhead) to store my data. How much overhead do you think it is using? Each row is 61 bytes in geldbrief, which is *exactly* the amount needed for the datatypes you have. [...] Now if you take 61 * 2449755 (number of rows) = 149435055 bytes used 157468096 Real amount 149435055 Data size - 8033041 Overhead 5.1% Total overhead of data I don't see how you could get it to be much smaller than that. Even with zero overhead it would only be 5% smaller, which still would require *a lot* of memory to store it all. The primary key itself is only using about 8 bytes of memory per row (because it is a hashed index, btree would be much larger), which is also very compact. With your own in-memory database, do you some sort of compression algorithm? That is the only way that I could see it taking up much less space. MySQL is pretty close to as efficient as you can get without compression. I did that calculation after my last post, too. I should have done that earlier ;) There is compression in our old solution, but I never thoght it was very efficient. I was definitly wrong, we get around 20% more data in our old database. That was what I thought mysql's overhead has to be since I didn't take the compression to serious. With all of that being said, I would just go with InnoDB, which can buffer the data in memory as well. In a later email you mention that you need to delete a lot of rows per hour. HEAP wouldn't work all that well for that since it uses table level locks. If it took 5 seconds to delete a large portion of rows, then the table would be locked for the duration of that. InnoDB, with its row level locking, would be much better for that purge process. If you turn off the innodb_flush_log_at_trx_commit (which would most likely be OK if your case) then most of the inserting would be done in memory anyways, and only written to disk in batches. The other option is as you said before using myisam merge tables. That will make dropping old records easier if you partition the data into the separate days. The only thing I would be worried about in that scenario is if inserts are coming from multiple threads. In that case you could still possibly run into locking issues with the inserts locking each other. I'm away next week, but I guess I'll just give all options a try on our development-system afterwards. I guess if something works with 2GB of RAM, it should also work with 6GB (Opteron). Or is mysql's behaviour changing with very big heap-tables or key_buffers? Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
very large HEAP-tables in 4.1.3
I was just wondering if anybody has been using very large HEAP-tables and if there are ways to have mysql use the memory more efficient: (I have no experience with all heap-tables but using them as temporary tables...) I just started testing with 2 heap-tables on a development-system (p4 3.2ghz, 2GB RAM) to get an idea of what's possible and what's not: I think mysql uses way too much memory (overhead) to store my data. I've fed the database with realtime-data for 1 hour now and I need at least 30 times as much in the tables. (200 times would be perfect - of course on some other machine with more memory) Right now top tells me that mysql is using around 10% of the memory. I already increased max_heap_table_size and I will have to increase it much more, but right now it seems that I would need 6GB of RAM to get my minimum amount of data in those tables. Which means I'd need a 64bit-system. But can mysql deal with 6GB-HEAP-tables??? So has anybody tried something like this, yet? We are currently using our own, self-written databases for this, but we are thinking about either using a sql-database in order to be able to make joins to other tables or adding features to our own software (64bit-support, sql-like interface etc.). If it works with mysql, we'd probably prefer that since its much less work for us and easier to handle. It doesn't matter if mysql uses a little more memory, but right now it seems like mysql is wasting way too much memory :( thanks for any help! Jan | max_heap_table_size | 49744 | mysql show table status\G *** 1. row *** Name: geldbrief Engine: HEAP Version: 9 Row_format: Fixed Rows: 2449755 Avg_row_length: 61 Data_length: 157468096 Max_data_length: 60634 Index_length: 19690688 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: max_rows=2000 Comment: *** 2. row *** Name: umsaetze Engine: HEAP Version: 9 Row_format: Fixed Rows: 236425 Avg_row_length: 45 Data_length: 11402880 Max_data_length: 535713975 Index_length: 1942648 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: max_rows=2000 Comment: 2 rows in set (0.00 sec) CREATE TABLE `geldbrief` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `bid` double(16,4) default NULL, `bidsize` double(16,4) default NULL, `ask` double(16,4) default NULL, `asksize` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 CREATE TABLE `umsaetze` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `kurs` double(16,4) default NULL, `umsatz` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: very large HEAP-tables in 4.1.3
Maybe you should try to normalize your table, 'symbol' could have its own table, that would reduce data and index. And then try to reduce the size of your rows, bidsize and asksize should be in integer I think. Maybe 'float' would be enough. What represents the 'quelle' column ? Is kurszeit necessary in your primary key ? CREATE TABLE `geldbrief` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `bid` double(16,4) default NULL, `bidsize` double(16,4) default NULL, `ask` double(16,4) default NULL, `asksize` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 CREATE TABLE `umsaetze` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `kurs` double(16,4) default NULL, `umsatz` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 -- Philippe Poelvoorde COS Trading Ltd. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: very large HEAP-tables in 4.1.3
Philippe Poelvoorde wrote: Maybe you should try to normalize your table, 'symbol' could have its own table, that would reduce data and index. And then try to reduce the size of your rows, bidsize and asksize should be in integer I think. Maybe 'float' would be enough. What represents the 'quelle' column ? Is kurszeit necessary in your primary key ? I changed a few columns, bidsize and asksize are integer now, and i changed ticknumber to smallint unsigned. At first I used the ticknumbers by the feedserver, now I count up to 65,000 and then reset the counter back to 0. I need that additional column to handle multiple ticks within one second. now I have a row_length of 41 instead of 61 on the geldbrief-table, but there is still just way to much memory-usage quelle is the stock-exchange (source); That table should store trades and bid/asks of stock-exchanges, so the primary key has to include: symbol ( i.e. IBM) quelle (numeric code for the stock-exchange) date and time ticknumber (in order be able to handle multiple ticks per second) any more suggestions? Maybe I'll test how a InnoDB-table with a huge innodb_buffer_pool_size will work. But since I'll have to do big delete's once every hour (kick old records) I have no idea if that would work out on a table with much more than 100,000,000 rows and insert coming in all the time... Another idea is to use a bunch of myisam-tables (4 or more for each day) and a merge-table. I could then do a flush tables with write lock;truncate table big_merge_table;unlock tables; on the myisam-tables to delete the old rows. I don't think that the disc-based table engines can respond quick enough and handle all the inserts at the same time... but I might give it a try next week. Does anybody have comments on those two ideas in case my in-memory-concept doesn't work.. thanks Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: very large HEAP-tables in 4.1.3
Hi, I changed a few columns, bidsize and asksize are integer now, and i changed ticknumber to smallint unsigned. At first I used the ticknumbers by the feedserver, now I count up to 65,000 and then reset the counter back to 0. I need that additional column to handle multiple ticks within one second. now I have a row_length of 41 instead of 61 on the geldbrief-table, but there is still just way to much memory-usage quelle is the stock-exchange (source); So normally a contract is traded on a principal exchange, not two, I would eventually suggest doing that : CREATE TABLE instrument ( id integer NOT NULL auto_increment, symbol varchar(12) NOT NULL default '', quelle int(10) NOT NULL default 0, PRIMARY KEY(id) ); that would save you around 9 bytes per records. (13 - foreign key) since normally stock are quoted in integer, you could event with a multiplier go for an integer instead of double in your bid/ask, that would save you 4 extra bytes (that's what we do on our side). By using an extra table 'instrument', your primary key will be really smaller and you would be able to use less memory. (try primary key(instrument_id,kurszeit), droping event the ticknumber, it will drop dramatically the memory usage for the index). I don't think that the disc-based table engines can respond quick enough and handle all the inserts at the same time... but I might give it a try next week. or one table per symbol maybe ? -- Philippe Poelvoorde COS Trading Ltd. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: very large HEAP-tables in 4.1.3
Philippe Poelvoorde wrote: Hi, I changed a few columns, bidsize and asksize are integer now, and i changed ticknumber to smallint unsigned. At first I used the ticknumbers by the feedserver, now I count up to 65,000 and then reset the counter back to 0. I need that additional column to handle multiple ticks within one second. now I have a row_length of 41 instead of 61 on the geldbrief-table, but there is still just way to much memory-usage quelle is the stock-exchange (source); So normally a contract is traded on a principal exchange, not two, I would eventually suggest doing that : Hi Philippe, That might be right if you only watch one country, but we currently have 35 international exchanges. So we don't have just one major exchange for IBM but we have NYSE, LSE, Xetra (german) etc. It looks like we'll try something disc-based and have a memory-database only with ask- and asksize for the important exchanges. We want to get rid of our old solution and it seems like the massive amount of data just doesn't fit into memory with mysql because of the overhead mysql has. The idea of introducing a numeric code instead of the char(12)-symbols and have a translation-table might be interesting. It makes everything a little less comfortable but saves a few bytes... I could split the data in individual tables for each exchange... It would save another 2 bytes for the exchange-ID... but the applications will have to choose the right table... I guess we'll have some discussion on that here in the company next week. I don't think that the disc-based table engines can respond quick enough and handle all the inserts at the same time... but I might give it a try next week. or one table per symbol maybe ? ehmmm.. 30 tables? not a good idea ;) I'd split it in tables by the time since that makes cleaning it up much easier (truncate table is much faster than delete from table where datefielddate_sub(now(),interval 5 day)) thanks for your help! Jan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]
Re: very large HEAP-tables in 4.1.3
Hi, A few questions for you. On Friday, August 6, 2004, at 06:17 AM, Jan Kirchhoff wrote: I was just wondering if anybody has been using very large HEAP-tables and if there are ways to have mysql use the memory more efficient: (I have no experience with all heap-tables but using them as temporary tables...) I just started testing with 2 heap-tables on a development-system (p4 3.2ghz, 2GB RAM) to get an idea of what's possible and what's not: I think mysql uses way too much memory (overhead) to store my data. How much overhead do you think it is using? Each row is 61 bytes in geldbrief, which is *exactly* the amount needed for the datatypes you have. char(12) = 12 bytes int(10) = 4 bytes datetime = 8 bytes int(10) = 4 bytes double(16,4) = 8 bytes double(16,4) = 8 bytes double(16,4) = 8 bytes double(16,4) = 8 bytes NULL values = 1 byte --- Total 61 bytes Now if you take 61 * 2449755 (number of rows) = 149435055 bytes used 157468096 Real amount 149435055 Data size - 8033041 Overhead 5.1% Total overhead of data I don't see how you could get it to be much smaller than that. Even with zero overhead it would only be 5% smaller, which still would require *a lot* of memory to store it all. The primary key itself is only using about 8 bytes of memory per row (because it is a hashed index, btree would be much larger), which is also very compact. With your own in-memory database, do you some sort of compression algorithm? That is the only way that I could see it taking up much less space. MySQL is pretty close to as efficient as you can get without compression. I've fed the database with realtime-data for 1 hour now and I need at least 30 times as much in the tables. (200 times would be perfect - of course on some other machine with more memory) Right now top tells me that mysql is using around 10% of the memory. I already increased max_heap_table_size and I will have to increase it much more, but right now it seems that I would need 6GB of RAM to get my minimum amount of data in those tables. Which means I'd need a 64bit-system. But can mysql deal with 6GB-HEAP-tables??? So has anybody tried something like this, yet? We are currently using our own, self-written databases for this, but we are thinking about either using a sql-database in order to be able to make joins to other tables or adding features to our own software (64bit-support, sql-like interface etc.). If it works with mysql, we'd probably prefer that since its much less work for us and easier to handle. It doesn't matter if mysql uses a little more memory, but right now it seems like mysql is wasting way too much memory :( thanks for any help! Jan | max_heap_table_size | 49744 | mysql show table status\G *** 1. row *** Name: geldbrief Engine: HEAP Version: 9 Row_format: Fixed Rows: 2449755 Avg_row_length: 61 Data_length: 157468096 Max_data_length: 60634 Index_length: 19690688 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: max_rows=2000 Comment: *** 2. row *** Name: umsaetze Engine: HEAP Version: 9 Row_format: Fixed Rows: 236425 Avg_row_length: 45 Data_length: 11402880 Max_data_length: 535713975 Index_length: 1942648 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: max_rows=2000 Comment: 2 rows in set (0.00 sec) CREATE TABLE `geldbrief` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `bid` double(16,4) default NULL, `bidsize` double(16,4) default NULL, `ask` double(16,4) default NULL, `asksize` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 CREATE TABLE `umsaetze` ( `symbol` char(12) NOT NULL default '', `quelle` int(10) unsigned NOT NULL default '0', `kurszeit` datetime NOT NULL default '-00-00 00:00:00', `ticknumber` int(10) unsigned NOT NULL default '0', `kurs` double(16,4) default NULL, `umsatz` double(16,4) default NULL, PRIMARY KEY (`symbol`,`quelle`,`kurszeit`,`ticknumber`) ) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000 With all of that being said, I would just go with InnoDB, which can buffer the data in memory as well. In a later email you mention that you need to delete a lot of rows per hour. HEAP wouldn't work all that well