Hi Nicholas, How about storing the BLOBS outside of the DB and refering to them ?
Best regards Nils Valentin Tokyo/Japan 2003年 7月 12日 土曜日 00:06、Nicholas Elliott さんは書きました: > Hey all, > > I've been experimenting with the best way to store a large (~100GB) of data > for retrieval. Essentially, I'm storing 9 variables for approximately > 1,000,000 locations a day for the last ten years. This can work out at > around 4MB a variable a day - but not all variables are always present > (could be null). > > (If you don't care about the details, I'll summarize at the end of this > email). > > > Inserting and retrieving from a MyISAM table seemed to be approaching > impossible. (Not totally surprising.) I originally had a table along the > lines of: > > create table basic_daily_report( > date DATE NOT NULL, > location_id MEDIUMINT UNSIGNED NOT NULL, > variable1 float, > variable2 float.... > variable9 float, > primary key (date, location_id) > ); > > (Just a summary of the actual table) > > With this I had a maxiumum table size of around 100GB - just barely enough > to do it. I expected I would end up segmenting by year, or something > similar, as ugly as that is. I tested InnoDB as an alternative to this, but > we'll get to that in a second. Basically, inserting a day's worth of data > would take ages, and pretty much require an analyze table for a couple > hours every morning. Selecting was getting to be pretty slow, as well. > Eventually, I hit on the idea of including one row per day: > > > create table basic_daily_grid( > date DATE NOT NULL PRIMARY KEY, > variable1 MEDIUMBLOB, > variable2 MEDIUMBLOB ... > variable9 MEDIUMBLOB > ); > > And wrote a UDF such that you pass it the variable and a location, and > it'll return the exact value. This works well because every day has a > constant number of locations in a grid format, so it's simply an array > lookup. So, > > select grid_point(location_id, variable1) from basic_daily_grid where > date=20030101 > > would return the right value for locationid. It turns out this is almost > (95%) as fast as the first version in selecting, but it has the added bonus > of inserts now only take ~5 seconds per day! > > > > > Alas, after inserting 260 days (less than a year) I hit the MyISAM table > size limit of 4.2GB - because a BLOB is a variable length field. > > > ----------------------------------------------------------------- > > > I mention all the above in case someone has an alternative solution I'm > looking over. Possible solutions I've found are a) use InnoDB instead, b) > work with the source to create a new field type BLOBARRAY of a constant > width instead of dynamic, c) work with the source to somehow overcome the > 4.2GB limit on a dynamic table. > > > c) Seems unlikely - if the actual developers can't do it, I probably can't > b) Seems possible, I assume no one saw a need for a constant width column > of 4MB, so hopefully its not too difficult a) Was my first try. Inserting > takes about twice as long as myisam... sure, I can deal with that. > Selecting a specific date is in the same ballpark as well, so little > problem there. What I'm having severe performance issues on are querys that > group, or do a count(*). For example: > > > mysql> explain select date from basic_daily_grid_innodb; > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > | table | type | possible_keys | key | key_len | ref | rows | Extra | > > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using > | index | > > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > > > > mysql> select date from basic_daily_grid_innodb; > ... > 317 rows in set (0.00 sec) > > mysql> explain select date, count(*) from basic_daily_grid_innodb group by > date; > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > | table | type | possible_keys | key | key_len | ref | rows | Extra | > > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using > | index | > > +-------------------------+-------+---------------+---------+---------+---- >--+------+-------------+ > > > mysql> explain select date, count(*) from basic_daily_grid_innodb group by > date; ... > 317 rows in set (2 min 54.95 sec) > > > I assume this is due to versioning or some other transactional feature. Or, > is this a bug, or am I doing something wrong? I don't quite see why > grouping items that are all unique should be that much slower than not > grouping. I need InnoDB for the unlimited table size, but I don't (Really) > need transactions, commit/rollback, or checkpoints. > > > Any suggestions on solving this last hurdle? Its entirely likely I'll need > to group by year and average the results, or something similar - and at 3 > minutes a pop, thats a little high for my liking. Perhaps I'm expecting too > much? > > > Thanks, > Nick Elliott -- --- Valentin Nils Internet Technology E-Mail: [EMAIL PROTECTED] URL: http://www.knowd.co.jp Personal URL: http://www.knowd.co.jp/staff/nils -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]