Hi!

On Feb 27, Lester Hightower wrote:
> To whom it may concern at Mysql AB:
> 
> I see from the Mysql 4.1.1 CHANGELOG that new COMPRESS(), UNCOMPRESS(),
> and UNCOMPRESSED_LENGTH() functions were added.  That is great news, and
> something I have been very interested in for a long time, as evidenced by
> this mysql mailing list thread, dating back to 12/18/2001:
> 
>       http://marc.10east.com/?t=100869803000005&r=1&w=2
> 
> The MARC system (marc.10east.com) was one of the primary reasons for me
> requesting that this feature be added to Mysql.  There is one short-coming
> in the new COMPRESS()/UNCOMPRESS() functionality that I would like to
> point out, and request that you address.
> 
> Here is the background:  Anyone that is running a huge system like MARC
> that has millions of uncompressed blob records in huge tables, needs to be
> able to migrate, in real-time and without down-time, to compressed blobs.
> Therefore, we need a way to know if a given field is compressed or not.
> 
> Running alter table on our tables, to add an am_i_compressed boolean, not
> only takes an excruciating amount of time and resources, but adds bits to
> each record that, from a disk-space perspective, we cannot afford.
> 
> Instead, I would like to be able to run a query like:
> 
>   select IFCOMPRESSED(msg_body, UNCOMPRESS(msg_body), msg_body)
>       from msg_bodies_200402
>       where <clause>
> 
> Note that the IFCOMPRESSED() function is the key, and what I am requesting
> be added to future versions of Mysql.  That function has to be possible.
> 
> Hopefully you are storing a header with your compressed data, and if so,
> then the IFCOMPRESSED() is trivial to implement.  If you are not storing a
> header with your compressed data, then this might be more complicated.
> Adding a header might be a possibility -- which is why I am trying to
> point this out _EARLY_ in the process before lots of people start using
> COMPRESS()/UNCOMPRESS(), or maybe zlib, lzo, or whatever library you are
> using can let you know if the data is compressed -- maybe they store a
> small header themselves.
> 
> Anyway, that is the issue that I want to point out and ask for assistance
> on.  Thank you very much for listening to the user community and adding
> COMPRESS()/UNCOMPRESS(), and please seriously consider this request.

There is a header - but it only stores the length of the uncompressed
data. And technically, no header can guarantee that the data are
compressed. The check you can use is something like

 1. test that UNCOMPRESSED_LENGTH returns something realistic, as you
    know how large a blob in your table can be - it should catch most of
    the uncompressed rows.
 2. try to uncompress the rest - zlib puts crc in the compressed stream,
    and UNCOMPRESS returns NULL if uncompression fails.

so the query could be something like

 select IF(UNCOMPRESSED_LENGTH(msg_body) > 102400, msg_body,
        IFNULL(UNCOMPRESS(msg_body), msg_body)) ...

Regards,
Sergei

-- 
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /   Sergei Golubchik <[EMAIL PROTECTED]>
 / /|_/ / // /\ \/ /_/ / /__  MySQL AB, Senior Software Developer
/_/  /_/\_, /___/\___\_\___/  Osnabrueck, Germany
       <___/  www.mysql.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to