Re: [Firebird-devel] Record level compression for V4

Slavomir Skopalik Thu, 03 Mar 2016 07:38:46 -0800

> RLE is good for record compression due to the following facts:
>
> 1) In-memory record has all the fields expanded up to their maximum
> size. Shorter values are padded with zeroes (that are easy to compress).
>
> 2) In-memory record stores the NULL bitmap and at the same time zaps the
> space of the NULL values with zeroes (that are easy to compress).
>
> 3) In-memory record has all the fields aligned to their respective
> boundaries. Unused gaps are also zapped with zeroes (that are easy to
> compress).
>
> Of course, it also works good for small numbers stored in INT/BIGINT
> fields and for repeating character sequences in CHAR/VARCHAR fields. But
> I'd say that 95% of the RLE effect is covered by the aforementioned
> three cases, especially (1) and (2). Longish VARCHAR fields in UTF8 are
> an extreme example.
>
> The legacy algorithm compresses up to 128 bytes into 2 bytes, i.e. has
> maximum compression ratio of 64x. It means that 10KB compresses into 156
> bytes while theoretically it could be compressed into three or four
> bytes. Quite a huge difference. This is what the suggested new algorithm
> (actually, a clever modification of the old one) is expected to solve.
> So far so good.
>
> My question is would the RLE compression still useful if points (1) -
> (3) disappear. For example, the record is packed the following way:
>
> - NULL bitmap
> - alignment padding is skipped
> - NULL fields are skipped
> - VARCHARs are stored in their actual length
> - probably the same for CHARs, just need to spend some CPU cycles to
> calculate the actual length (*)
This can be solved by Jim supposed Value encoding.
It is very simple. Just read all fields  in record and encode into 
binary buffer.
And decoding just read data from buffer and storing into fields.
CHARs will be little slower, but each byte will be accessed only ones.
Today, each byte is accessed one times as minimum (from second byte of 
same bytes),
but can be aprox. 3+ times when record fragmentation occur.


The speed and "compress" ratio are highly depended on data, but will be 
better then current one.
In compare with Elekt Labs RLE the final effect will be similar, because 
there are lot of other overheads outside.

When I testing LZ4, DB size decrease only by ~1%, record length decrease 
by ~10%.

But value encoding cannot be implemented until we switch from fragment 
compression to true record level compression.
When this will be done, it will be easy improve record encoding.
And because RLE or encoding is really fast, it can be used for any other 
purpose.

Slavek



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Record level compression for V4

Reply via email to