Re: [HACKERS] Custom compression methods

Mark Dilger Tue, 01 Sep 2020 16:27:50 -0700

> On Aug 13, 2020, at 4:48 AM, Dilip Kumar <[email protected]> wrote:
> 
> v1-0001: As suggested by Robert, it provides the syntax support for
> setting the compression method for a column while creating a table and
> adding columns.  However, we don't support changing the compression
> method for the existing column.  As part of this patch, there is only
> one built-in compression method that can be set (pglz).  In this, we
> have one in-build am (pglz) and the compressed attributes will directly
> store the oid of the AM.  In this patch, I have removed the
> pg_attr_compresion as we don't support changing the compression
> for the existing column so we don't need to preserve the old
> compressions.

I do not like the way pglz compression is handled in this patch.  After 
upgrading PostgreSQL to the first version with this patch included, 
pre-existing on-disk compressed data will not include any custom compression 
Oid in the header, and toast_decompress_datum will notice that and decompress 
the data directly using pglz_decompress.   If the same data were then written 
back out, perhaps to another table, into a column with no custom compression 
method defined, it will get compressed by toast_compress_datum using 
DefaultCompressionOid, which is defined as PGLZ_COMPRESSION_AM_OID.  That isn't 
a proper round-trip for the data, as when it gets re-compressed, the 
PGLZ_COMPRESSION_AM_OID gets written into the header, which makes the data a 
bit longer, but also means that it is not byte-for-byte the same as it was, 
which is counter-intuitive.  Given that any given pglz compressed datum now has 
two totally different formats that might occur on disk, code may have to 
consider both of them, which increases code complexity, and regression tests 
will need to be written with coverage for both of them, which increases test 
complexity.  It's also not easy to write the extra tests, as there isn't any 
way (that I see) to intentionally write out the traditional shorter form from a 
newer database server; you'd have to do something like a pg_upgrade test where 
you install an older server to write the older format, upgrade, and then check 
that the new server can handle it.

The cleanest solution to this would seem to be removal of the compression am's 
Oid from the header for all compression ams, so that pre-patch written data and 
post-patch written data look exactly the same.  The other solution is to give 
pglz pride-of-place as the original compression mechanism, and just say that 
when pglz is the compression method, no Oid gets written to the header, and 
only when other compression methods are used does the Oid get written.  This 
second option seems closer to the implementation that you already have, because 
you already handle the decompression of data where the Oid is lacking, so all 
you have to do is intentionally not write the Oid when compressing using pglz.

Or did I misunderstand your implementation?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: [HACKERS] Custom compression methods

Reply via email to