Hey all, I've been experimenting with the best way to store a large (~100GB) of data for retrieval. Essentially, I'm storing 9 variables for approximately 1,000,000 locations a day for the last ten years. This can work out at around 4MB a variable a day - but not all variables are always present (could be null).
(If you don't care about the details, I'll summarize at the end of this email). Inserting and retrieving from a MyISAM table seemed to be approaching impossible. (Not totally surprising.) I originally had a table along the lines of: create table basic_daily_report( date DATE NOT NULL, location_id MEDIUMINT UNSIGNED NOT NULL, variable1 float, variable2 float.... variable9 float, primary key (date, location_id) ); (Just a summary of the actual table) With this I had a maxiumum table size of around 100GB - just barely enough to do it. I expected I would end up segmenting by year, or something similar, as ugly as that is. I tested InnoDB as an alternative to this, but we'll get to that in a second. Basically, inserting a day's worth of data would take ages, and pretty much require an analyze table for a couple hours every morning. Selecting was getting to be pretty slow, as well. Eventually, I hit on the idea of including one row per day: create table basic_daily_grid( date DATE NOT NULL PRIMARY KEY, variable1 MEDIUMBLOB, variable2 MEDIUMBLOB ... variable9 MEDIUMBLOB ); And wrote a UDF such that you pass it the variable and a location, and it'll return the exact value. This works well because every day has a constant number of locations in a grid format, so it's simply an array lookup. So, select grid_point(location_id, variable1) from basic_daily_grid where date=20030101 would return the right value for locationid. It turns out this is almost (95%) as fast as the first version in selecting, but it has the added bonus of inserts now only take ~5 seconds per day! Alas, after inserting 260 days (less than a year) I hit the MyISAM table size limit of 4.2GB - because a BLOB is a variable length field. ----------------------------------------------------------------- I mention all the above in case someone has an alternative solution I'm looking over. Possible solutions I've found are a) use InnoDB instead, b) work with the source to create a new field type BLOBARRAY of a constant width instead of dynamic, c) work with the source to somehow overcome the 4.2GB limit on a dynamic table. c) Seems unlikely - if the actual developers can't do it, I probably can't b) Seems possible, I assume no one saw a need for a constant width column of 4MB, so hopefully its not too difficult a) Was my first try. Inserting takes about twice as long as myisam... sure, I can deal with that. Selecting a specific date is in the same ballpark as well, so little problem there. What I'm having severe performance issues on are querys that group, or do a count(*). For example: mysql> explain select date from basic_daily_grid_innodb; +-------------------------+-------+---------------+---------+---------+------+------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------------------------+-------+---------------+---------+---------+------+------+-------------+ | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using index | +-------------------------+-------+---------------+---------+---------+------+------+-------------+ mysql> select date from basic_daily_grid_innodb; ... 317 rows in set (0.00 sec) mysql> explain select date, count(*) from basic_daily_grid_innodb group by date; +-------------------------+-------+---------------+---------+---------+------+------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------------------------+-------+---------------+---------+---------+------+------+-------------+ | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using index | +-------------------------+-------+---------------+---------+---------+------+------+-------------+ mysql> explain select date, count(*) from basic_daily_grid_innodb group by date; ... 317 rows in set (2 min 54.95 sec) I assume this is due to versioning or some other transactional feature. Or, is this a bug, or am I doing something wrong? I don't quite see why grouping items that are all unique should be that much slower than not grouping. I need InnoDB for the unlimited table size, but I don't (Really) need transactions, commit/rollback, or checkpoints. Any suggestions on solving this last hurdle? Its entirely likely I'll need to group by year and average the results, or something similar - and at 3 minutes a pop, thats a little high for my liking. Perhaps I'm expecting too much? Thanks, Nick Elliott