Hi all,

I would like to propose that mysql have compress_*()/decompress_*()
functions added that would be similar to des_encrypt()/des_decrypt().

My idea would be to have functions available like:

compress_gz()/decompress_gz()
compress_bz()/decompress_bz()
...etc.

I have read of people doing this in their applications:
        http://marc.theaimsgroup.com/?l=mysql&m=100823186816632&w=2
...but I could really use the feature directly in SQL.

The ability to do something like:

mysql> SELECT event_id,decompress_gz(event_text) from events where event_date between 
20011201 and 20011210;

assuming that event_text is a *char and that records are always inserted
using the compress_gz() function.  Doing it this way (as a function) would
allow people to still use software algorithms outside of MySQL if they
wanted/when appropriate.  The disadvantage is that you must be consistent
in your use of the fields, and Mysql will need a defined behavior for when
that fails (when one tries to decompress_gz a non-gz-compressed field).

The other option is to have extended string types that are defined like:

  event_text varchar(255) not null default '' compress_gz,
  event_blob text not null default '' compress_bz,

and then have Mysql handle the compression/decompression internally, and
disallow using those fields in WHERE clauses.  I personally prefer the
former option (compress/decompress functions) because I prefer having the
flexibility in exchange for taking the responsibility, but either method
would be useful to me.


Here is what prompted this request:

root@# ls -l events.* events_raw.* | cut -b30-500
    63571071 Dec 18 12:29 events.MYD
    63397888 Dec 18 12:29 events.MYI
        9119 Dec  6 18:15 events.frm
    92184576 Dec 18 12:29 events_raw.MYD
    12948480 Dec 18 12:29 events_raw.MYI
        8602 Nov 21 00:19 events_raw.frm

I have been working on a project where this events table, which has ~1.5
million records and grows by ~100k/per day, started out at about 600MB.  
I rearranged the data, moved large text strings out into associated tables
with numeric IDs and joins, etc.  Now the events table is much more
scalable, and the system it supports much faster.

The events_raw table, which has two fields (event_id,event_text), holds
the raw event message that was received (before it was parsed and stored)
which is extremely useful for debugging.  The events_raw table is now the
scalability problem, as it will hit a file size limit long before events.

The events_raw table is only searched using a join on
events.event_id=events_raw.event_id, so if I could compress the
events_raw.event_text field in the table, problem solved.

So, my example above would be more accurate for my case if written as:

mysql> SELECT e.event_id, decompress_gz(r.event_text) as original_event_text from 
events as e left join events_raw as r on e.event_id=r.event_id where e.event_date 
between 20011201 and 20011210;


I think we could benefit from this feature in our MARC system, as well.
marc.theaimsgroup.com


Thanks for your time and consideration.


Sincerely,

-- 
Lester Hightower                                         TheAIMSGroup.com


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to