Re: very large HEAP-tables in 4.1.3

2004-08-07 Thread Jan Kirchhoff
harrison, thanks for you mail,
I think mysql uses way too much memory (overhead) to store my data.

How much overhead do you think it is using?  Each row is 61 bytes in 
geldbrief, which is *exactly* the amount needed for the datatypes you 
have.
[...]
Now if you take 61 * 2449755 (number of rows) = 149435055 bytes used
157468096 Real amount
149435055 Data size
-
  8033041 Overhead
 5.1% Total overhead of data
I don't see how you could get it to be much smaller than that.  Even 
with zero overhead it would only be 5% smaller, which still would 
require *a lot* of memory to store it all.  The primary key itself is 
only using about 8 bytes of memory per row (because it is a hashed 
index, btree would be much larger), which is also very compact.

With your own in-memory database, do you some sort of compression 
algorithm?  That is the only way that I could see it taking up much 
less space.  MySQL is pretty close to as efficient as you can get 
without compression.
I did that calculation after my last post, too. I should have done that 
earlier ;) There is compression in our old solution, but I never thoght 
it was very efficient. I was definitly wrong, we get around 20% more 
data in our old database. That was what I thought mysql's overhead has 
to be since I didn't take the compression to serious.

With all of that being said, I would just go with InnoDB, which can 
buffer the data in memory as well.  In a later email you mention that 
you need to delete a lot of rows per hour.  HEAP wouldn't work all 
that well for that since it uses table level locks.  If it took 5 
seconds to delete a large portion of rows, then the table would be 
locked for the duration of that.  InnoDB, with its row level locking, 
would be much better for that purge process.  If you turn off the 
innodb_flush_log_at_trx_commit (which would most likely be OK if your 
case) then most of the inserting would be done in memory anyways, and 
only written to disk in batches.

The other option is as you said before using myisam merge tables.  
That will make dropping old records easier if you partition the data 
into the separate days.  The only thing I would be worried about in 
that scenario is if inserts are coming from multiple threads.  In that 
case you could still possibly run into locking issues with the inserts 
locking each other.

I'm away next week, but I guess I'll just give all options a try on our 
development-system afterwards. I guess if something works with 2GB of 
RAM, it should also work with 6GB (Opteron). Or is mysql's behaviour  
changing with very big heap-tables or key_buffers?

Jan
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


very large HEAP-tables in 4.1.3

2004-08-06 Thread Jan Kirchhoff
I was just wondering if anybody has been using very large HEAP-tables 
and if there are ways to have mysql use the memory more efficient:
(I have no experience with all heap-tables but using them as temporary 
tables...)

I just started testing with 2 heap-tables on a development-system (p4 
3.2ghz, 2GB RAM) to get an idea of what's possible and what's not:

I think mysql uses way too much memory (overhead) to store my data. I've 
fed the database with realtime-data for 1 hour now and I need at least 
30 times as much in the tables. (200 times would be perfect - of course 
on some other machine with more memory)
Right now top tells me that mysql is using around 10% of the memory.

I already increased  max_heap_table_size and I will have to increase it 
much more, but right now it seems that I would need 6GB of RAM to get my 
minimum amount of data in those tables. Which means I'd need a 
64bit-system. But can mysql deal with 6GB-HEAP-tables???

So has anybody tried something like this, yet? We are currently using 
our own, self-written databases for this, but we are thinking about 
either using a sql-database in order to be able to make joins to other 
tables or adding features to our own software (64bit-support, sql-like 
interface etc.). If it works with mysql, we'd probably prefer that since 
its much less work for us and easier to handle. It doesn't matter if 
mysql uses a little more memory, but right now it seems like mysql is 
wasting way too much memory :(

thanks for any help!
Jan
| max_heap_table_size | 
49744
|


mysql show table status\G
*** 1. row ***
  Name: geldbrief
Engine: HEAP
   Version: 9
Row_format: Fixed
  Rows: 2449755
Avg_row_length: 61
   Data_length: 157468096
Max_data_length: 60634
  Index_length: 19690688
 Data_free: 0
Auto_increment: NULL
   Create_time: NULL
   Update_time: NULL
Check_time: NULL
 Collation: latin1_swedish_ci
  Checksum: NULL
Create_options: max_rows=2000
   Comment:
*** 2. row ***
  Name: umsaetze
Engine: HEAP
   Version: 9
Row_format: Fixed
  Rows: 236425
Avg_row_length: 45
   Data_length: 11402880
Max_data_length: 535713975
  Index_length: 1942648
 Data_free: 0
Auto_increment: NULL
   Create_time: NULL
   Update_time: NULL
Check_time: NULL
 Collation: latin1_swedish_ci
  Checksum: NULL
Create_options: max_rows=2000
   Comment:
2 rows in set (0.00 sec)
CREATE TABLE `geldbrief` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `bid` double(16,4) default NULL,
 `bidsize` double(16,4) default NULL,
 `ask` double(16,4) default NULL,
 `asksize` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000
CREATE TABLE `umsaetze` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `kurs` double(16,4) default NULL,
 `umsatz` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: very large HEAP-tables in 4.1.3

2004-08-06 Thread Philippe Poelvoorde
Maybe you should try to normalize your table,
'symbol' could have its own table, that would reduce data and index.
And then try to reduce the size of your rows, bidsize and asksize should 
be in integer I think. Maybe 'float' would be enough.
What represents the 'quelle' column ?
Is kurszeit necessary in your primary key ?


CREATE TABLE `geldbrief` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `bid` double(16,4) default NULL,
 `bidsize` double(16,4) default NULL,
 `ask` double(16,4) default NULL,
 `asksize` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000
CREATE TABLE `umsaetze` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `kurs` double(16,4) default NULL,
 `umsatz` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000

--
Philippe Poelvoorde
COS Trading Ltd.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: very large HEAP-tables in 4.1.3

2004-08-06 Thread Jan Kirchhoff
Philippe Poelvoorde wrote:
Maybe you should try to normalize your table,
'symbol' could have its own table, that would reduce data and index.
And then try to reduce the size of your rows, bidsize and asksize 
should be in integer I think. Maybe 'float' would be enough.
What represents the 'quelle' column ?
Is kurszeit necessary in your primary key ?

I changed a few columns, bidsize and asksize are integer now, and i 
changed ticknumber to smallint unsigned.
At first I used the ticknumbers by the feedserver, now I count up to 
65,000 and then reset the counter back to 0. I need that additional 
column to handle multiple ticks within one second.
now I have a row_length of 41 instead of 61 on the geldbrief-table, 
but there is still just way to much memory-usage

quelle is the stock-exchange (source);
That table should store trades and bid/asks of stock-exchanges, so the 
primary key has to include:

symbol ( i.e. IBM)
quelle (numeric code for the stock-exchange)
date and time
ticknumber (in order be able to handle multiple ticks per second)
any more suggestions?
Maybe I'll test how a InnoDB-table with a huge innodb_buffer_pool_size 
will work. But since I'll have to do big delete's once every hour (kick 
old records) I have no idea if that would work out on a table with much 
more than 100,000,000 rows and insert coming in all the time...
Another idea is to use a bunch of myisam-tables (4 or more for each day) 
and a merge-table. I could then do a flush tables with write 
lock;truncate table big_merge_table;unlock tables; on the myisam-tables 
to delete the old rows.
I don't think that the disc-based table engines can respond quick enough 
and handle all the inserts at the same time... but I might give it a try 
next week.

Does anybody have comments on those two ideas in case my 
in-memory-concept doesn't work..

thanks
Jan

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: very large HEAP-tables in 4.1.3

2004-08-06 Thread Philippe Poelvoorde
Hi,
I changed a few columns, bidsize and asksize are integer now, and i 
changed ticknumber to smallint unsigned.
At first I used the ticknumbers by the feedserver, now I count up to 
65,000 and then reset the counter back to 0. I need that additional 
column to handle multiple ticks within one second.
now I have a row_length of 41 instead of 61 on the geldbrief-table, 
but there is still just way to much memory-usage

quelle is the stock-exchange (source);
So normally a contract is traded on a principal exchange, not two, I 
would eventually suggest doing that :
CREATE TABLE instrument (
	id integer NOT NULL auto_increment,
	symbol varchar(12) NOT NULL default '',
	quelle int(10) NOT NULL default 0,
	PRIMARY KEY(id)
);
that would save you around 9 bytes per records. (13 - foreign key)

since normally stock are quoted in integer, you could event with a 
multiplier go for an integer instead of double in your bid/ask,  that 
would save you 4 extra bytes (that's what we do on our side).
By using an extra table 'instrument', your primary key will be really 
smaller and you would be able to use less memory.
(try primary key(instrument_id,kurszeit), droping event the ticknumber, 
it will drop dramatically the memory usage for the index).


I don't think that the disc-based table engines can respond quick enough 
and handle all the inserts at the same time... but I might give it a try 
next week.
or one table per symbol maybe ?
--
Philippe Poelvoorde
COS Trading Ltd.
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: very large HEAP-tables in 4.1.3

2004-08-06 Thread Jan Kirchhoff
Philippe Poelvoorde wrote:
Hi,
I changed a few columns, bidsize and asksize are integer now, and i 
changed ticknumber to smallint unsigned.
At first I used the ticknumbers by the feedserver, now I count up to 
65,000 and then reset the counter back to 0. I need that additional 
column to handle multiple ticks within one second.
now I have a row_length of 41 instead of 61 on the geldbrief-table, 
but there is still just way to much memory-usage

quelle is the stock-exchange (source);

So normally a contract is traded on a principal exchange, not two, I 
would eventually suggest doing that :
Hi Philippe,
That might be right if you only watch one country, but we currently have 
35 international exchanges. So we don't have just one major exchange for 
IBM but we have NYSE, LSE, Xetra (german) etc.

It looks like we'll try something disc-based and have a memory-database 
only with ask- and asksize for the important exchanges. We want to get 
rid of our old solution and it seems like the massive amount of data 
just doesn't fit into memory with mysql because of the overhead mysql has.
The idea of introducing a numeric code instead of the char(12)-symbols 
and have a translation-table might be interesting. It makes everything a 
little less comfortable but saves a few bytes... I could split the data 
in individual tables for each exchange... It would save another 2 bytes 
for the exchange-ID... but the applications will have to choose the 
right table... I guess we'll have some discussion on that here in the 
company next week.

I don't think that the disc-based table engines can respond quick 
enough and handle all the inserts at the same time... but I might 
give it a try next week.

or one table per symbol maybe ?
ehmmm..  30 tables? not a good idea ;) I'd split it in tables by 
the time since that makes cleaning it up much easier (truncate table is 
much faster than delete from table where  
datefielddate_sub(now(),interval 5 day))

thanks for your help!
Jan
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/[EMAIL PROTECTED]


Re: very large HEAP-tables in 4.1.3

2004-08-06 Thread Harrison
Hi,
A few questions for you.
On Friday, August 6, 2004, at 06:17  AM, Jan Kirchhoff wrote:
I was just wondering if anybody has been using very large HEAP-tables 
and if there are ways to have mysql use the memory more efficient:
(I have no experience with all heap-tables but using them as temporary 
tables...)

I just started testing with 2 heap-tables on a development-system (p4 
3.2ghz, 2GB RAM) to get an idea of what's possible and what's not:

I think mysql uses way too much memory (overhead) to store my data.
How much overhead do you think it is using?  Each row is 61 bytes in 
geldbrief, which is *exactly* the amount needed for the datatypes you 
have.
 char(12) 	= 12 bytes
 int(10)  	=  4 bytes
 datetime 	=  8 bytes
 int(10)  	=  4 bytes
 double(16,4) =  8 bytes
 double(16,4) =  8 bytes
 double(16,4) =  8 bytes
 double(16,4) =  8 bytes
 NULL values	=  1 byte
---
 Total		  61 bytes

Now if you take 61 * 2449755 (number of rows) = 149435055 bytes used
157468096 Real amount
149435055 Data size
-
  8033041 Overhead
 5.1% Total overhead of data
I don't see how you could get it to be much smaller than that.  Even 
with zero overhead it would only be 5% smaller, which still would 
require *a lot* of memory to store it all.  The primary key itself is 
only using about 8 bytes of memory per row (because it is a hashed 
index, btree would be much larger), which is also very compact.

With your own in-memory database, do you some sort of compression 
algorithm?  That is the only way that I could see it taking up much 
less space.  MySQL is pretty close to as efficient as you can get 
without compression.


 I've fed the database with realtime-data for 1 hour now and I need at 
least 30 times as much in the tables. (200 times would be perfect - of 
course on some other machine with more memory)
Right now top tells me that mysql is using around 10% of the memory.

I already increased  max_heap_table_size and I will have to increase 
it much more, but right now it seems that I would need 6GB of RAM to 
get my minimum amount of data in those tables. Which means I'd need a 
64bit-system. But can mysql deal with 6GB-HEAP-tables???

So has anybody tried something like this, yet? We are currently using 
our own, self-written databases for this, but we are thinking about 
either using a sql-database in order to be able to make joins to other 
tables or adding features to our own software (64bit-support, sql-like 
interface etc.). If it works with mysql, we'd probably prefer that 
since its much less work for us and easier to handle. It doesn't 
matter if mysql uses a little more memory, but right now it seems like 
mysql is wasting way too much memory :(

thanks for any help!
Jan
| max_heap_table_size | 49744  
  |


mysql show table status\G
*** 1. row ***
  Name: geldbrief
Engine: HEAP
   Version: 9
Row_format: Fixed
  Rows: 2449755
Avg_row_length: 61
   Data_length: 157468096
Max_data_length: 60634
  Index_length: 19690688
 Data_free: 0
Auto_increment: NULL
   Create_time: NULL
   Update_time: NULL
Check_time: NULL
 Collation: latin1_swedish_ci
  Checksum: NULL
Create_options: max_rows=2000
   Comment:
*** 2. row ***
  Name: umsaetze
Engine: HEAP
   Version: 9
Row_format: Fixed
  Rows: 236425
Avg_row_length: 45
   Data_length: 11402880
Max_data_length: 535713975
  Index_length: 1942648
 Data_free: 0
Auto_increment: NULL
   Create_time: NULL
   Update_time: NULL
Check_time: NULL
 Collation: latin1_swedish_ci
  Checksum: NULL
Create_options: max_rows=2000
   Comment:
2 rows in set (0.00 sec)
CREATE TABLE `geldbrief` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `bid` double(16,4) default NULL,
 `bidsize` double(16,4) default NULL,
 `ask` double(16,4) default NULL,
 `asksize` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000
CREATE TABLE `umsaetze` (
 `symbol` char(12) NOT NULL default '',
 `quelle` int(10) unsigned NOT NULL default '0',
 `kurszeit` datetime NOT NULL default '-00-00 00:00:00',
 `ticknumber` int(10) unsigned NOT NULL default '0',
 `kurs` double(16,4) default NULL,
 `umsatz` double(16,4) default NULL,
 PRIMARY KEY  (`symbol`,`quelle`,`kurszeit`,`ticknumber`)
) ENGINE=HEAP DEFAULT CHARSET=latin1 MAX_ROWS=2000

With all of that being said, I would just go with InnoDB, which can 
buffer the data in memory as well.  In a later email you mention that 
you need to delete a lot of rows per hour.  HEAP wouldn't work all that 
well