Hi all, I would like to propose that mysql have compress_*()/decompress_*() functions added that would be similar to des_encrypt()/des_decrypt().
My idea would be to have functions available like: compress_gz()/decompress_gz() compress_bz()/decompress_bz() ...etc. I have read of people doing this in their applications: http://marc.theaimsgroup.com/?l=mysql&m=100823186816632&w=2 ...but I could really use the feature directly in SQL. The ability to do something like: mysql> SELECT event_id,decompress_gz(event_text) from events where event_date between 20011201 and 20011210; assuming that event_text is a *char and that records are always inserted using the compress_gz() function. Doing it this way (as a function) would allow people to still use software algorithms outside of MySQL if they wanted/when appropriate. The disadvantage is that you must be consistent in your use of the fields, and Mysql will need a defined behavior for when that fails (when one tries to decompress_gz a non-gz-compressed field). The other option is to have extended string types that are defined like: event_text varchar(255) not null default '' compress_gz, event_blob text not null default '' compress_bz, and then have Mysql handle the compression/decompression internally, and disallow using those fields in WHERE clauses. I personally prefer the former option (compress/decompress functions) because I prefer having the flexibility in exchange for taking the responsibility, but either method would be useful to me. Here is what prompted this request: root@# ls -l events.* events_raw.* | cut -b30-500 63571071 Dec 18 12:29 events.MYD 63397888 Dec 18 12:29 events.MYI 9119 Dec 6 18:15 events.frm 92184576 Dec 18 12:29 events_raw.MYD 12948480 Dec 18 12:29 events_raw.MYI 8602 Nov 21 00:19 events_raw.frm I have been working on a project where this events table, which has ~1.5 million records and grows by ~100k/per day, started out at about 600MB. I rearranged the data, moved large text strings out into associated tables with numeric IDs and joins, etc. Now the events table is much more scalable, and the system it supports much faster. The events_raw table, which has two fields (event_id,event_text), holds the raw event message that was received (before it was parsed and stored) which is extremely useful for debugging. The events_raw table is now the scalability problem, as it will hit a file size limit long before events. The events_raw table is only searched using a join on events.event_id=events_raw.event_id, so if I could compress the events_raw.event_text field in the table, problem solved. So, my example above would be more accurate for my case if written as: mysql> SELECT e.event_id, decompress_gz(r.event_text) as original_event_text from events as e left join events_raw as r on e.event_id=r.event_id where e.event_date between 20011201 and 20011210; I think we could benefit from this feature in our MARC system, as well. marc.theaimsgroup.com Thanks for your time and consideration. Sincerely, -- Lester Hightower TheAIMSGroup.com --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php