Hi Nicholas,

How about storing the BLOBS outside of the DB and refering to them ?

Best regards

Nils Valentin
Tokyo/Japan


2003年 7月 12日 土曜日 00:06、Nicholas Elliott さんは書きました:
> Hey all,
>
> I've been experimenting with the best way to store a large (~100GB) of data
> for retrieval. Essentially, I'm storing 9 variables for approximately
> 1,000,000 locations a day for the last ten years. This can work out at
> around 4MB a variable a day - but not all variables are always present
> (could be null).
>
> (If you don't care about the details, I'll summarize at the end of this
> email).
>
>
> Inserting and retrieving from a MyISAM table seemed to be approaching
> impossible. (Not totally surprising.) I originally had a table along the
> lines of:
>
> create table basic_daily_report(
> date DATE NOT NULL,
> location_id MEDIUMINT UNSIGNED NOT NULL,
> variable1 float,
> variable2 float....
> variable9 float,
> primary key (date, location_id)
> );
>
> (Just a summary of the actual table)
>
> With this I had a maxiumum table size of around 100GB - just barely enough
> to do it. I expected I would end up segmenting by year, or something
> similar, as ugly as that is. I tested InnoDB as an alternative to this, but
> we'll get to that in a second. Basically, inserting a day's worth of data
> would take ages, and pretty much require an analyze table for a couple
> hours every morning. Selecting was getting to be pretty slow, as well.
> Eventually, I hit on the idea of including one row per day:
>
>
> create table basic_daily_grid(
> date DATE NOT NULL PRIMARY KEY,
> variable1 MEDIUMBLOB,
> variable2 MEDIUMBLOB ...
> variable9 MEDIUMBLOB
> );
>
> And wrote a UDF such that you pass it the variable and a location, and
> it'll return the exact value. This works well because every day has a
> constant number of locations in a grid format, so it's simply an array
> lookup. So,
>
> select grid_point(location_id, variable1) from basic_daily_grid where
> date=20030101
>
> would return the right value for locationid. It turns out this is almost
> (95%) as fast as the first version in selecting, but it has the added bonus
> of inserts now only take ~5 seconds per day!
>
>
>
>
> Alas, after inserting 260 days (less than a year) I hit the MyISAM table
> size limit of 4.2GB - because a BLOB is a variable length field.
>
>
> -----------------------------------------------------------------
>
>
> I mention all the above in case someone has an alternative solution I'm
> looking over. Possible solutions I've found are a) use InnoDB instead, b)
> work with the source to create a new field type BLOBARRAY of a constant
> width instead of dynamic, c) work with the source to somehow overcome the
> 4.2GB limit on a dynamic table.
>
>
> c) Seems unlikely - if the actual developers can't do it, I probably can't
> b) Seems possible, I assume no one saw a need for a constant width column
> of 4MB, so hopefully its not too difficult a) Was my first try. Inserting
> takes about twice as long as myisam... sure, I can deal with that.
> Selecting a specific date is in the same ballpark as well, so little
> problem there. What I'm having severe performance issues on are querys that
> group, or do a count(*). For example:
>
>
> mysql> explain select date from basic_daily_grid_innodb;
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
> | table | type | possible_keys | key | key_len | ref | rows | Extra |
>
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
> | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using
> | index |
>
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
>
>
>
> mysql> select date from basic_daily_grid_innodb;
> ...
> 317 rows in set (0.00 sec)
>
> mysql> explain select date, count(*) from basic_daily_grid_innodb group by
> date;
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
> | table | type | possible_keys | key | key_len | ref | rows | Extra |
>
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
> | basic_daily_grid_innodb | index | NULL | PRIMARY | 3 | NULL |301 | Using
> | index |
>
> +-------------------------+-------+---------------+---------+---------+----
>--+------+-------------+
>
>
> mysql> explain select date, count(*) from basic_daily_grid_innodb group by
> date; ...
> 317 rows in set (2 min 54.95 sec)
>
>
> I assume this is due to versioning or some other transactional feature. Or,
> is this a bug, or am I doing something wrong? I don't quite see why
> grouping items that are all unique should be that much slower than not
> grouping. I need InnoDB for the unlimited table size, but I don't (Really)
> need transactions, commit/rollback, or checkpoints.
>
>
> Any suggestions on solving this last hurdle? Its entirely likely I'll need
> to group by year and average the results, or something similar - and at 3
> minutes a pop, thats a little high for my liking. Perhaps I'm expecting too
> much?
>
>
> Thanks,
> Nick Elliott

-- 
---
Valentin Nils
Internet Technology

 E-Mail: [EMAIL PROTECTED]
 URL: http://www.knowd.co.jp
 Personal URL: http://www.knowd.co.jp/staff/nils


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to