Hi hackers!

There's a lot of compression discussions nowadays. And that's cool!
Recently Naresh Chainani in private discussion shared with me the idea to 
compress temporary files on disk.
And I was thrilled to find no evidence of implementation of this interesting 
idea.

I've prototyped Random Access Compressed File for fun[0]. The code is very 
dirty proof-of-concept.
I compress Buffile by one block at a time. There are directory pages to store 
information about the size of each compressed block. If any byte of the block 
is changed - whole block is recompressed. Wasted space is never reused. If 
compressed block is more then BLCSZ - unknown bad things will happen :)

Here are some my observations.

0. The idea seems feasible. API of fd.c used by buffile.c can easily be 
abstracted for compressed temporary files. Seeks are necessary, but they are 
not very frequent. It's easy to make temp file compression GUC-controlled.

1. Temp file footprint can be easily reduced. For example query
create unlogged table y as select random()::text t from 
generate_series(0,9999999) g;
uses for toast index build 140000000 bytes of temp file. With patch this value 
is reduced to 40841704 (x3.42 smaller).

2. I have not found any evidence of performance improvement. I've only 
benchmarked patch on my laptop. And RAM (page cache) diminished any difference 
between writing compressed block and uncompressed block.

How do you think: does it worth to pursue the idea? OLTP systems rarely rely on 
data spilled to disk.
Are there any known good random access compressed file libs? So we could avoid 
reinventing the wheel.
Maybe someone tried this approach before?

Thanks!

Best regards, Andrey Borodin.

[0] 
https://github.com/x4m/postgres_g/commit/426cd767694b88e64f5e6bee99fc653c45eb5abd

Reply via email to