robertmu opened a new issue, #1254:
URL: https://github.com/apache/cloudberry/issues/1254
### Issue Description
There appears to be an inconsistency in how Cloudberry handles the default
`checksum` storage option for append-optimized (AO) tables.
The system configuration `gp_default_storage_options` correctly shows that
`checksum=true` is part of the default settings. However, when an AO table is
created without an explicit `checksum` clause, this default value is not
persisted to the table's metadata in the `pg_class.reloptions` column.
This behavior differs from Greenplum, which correctly persists the
`checksum=true` default to `pg_class.reloptions`. Since
`gp_default_storage_options` is a GUC that can be changed at any time, it is
critical that the effective storage options at creation time are explicitly
recorded in the metadata. The current behavior can lead to a misunderstanding
of the table's actual storage properties if the GUC is changed later.
### Reproduction and Evidence
The following raw `psql` session logs demonstrate the issue. The session on
Cloudberry shows that `checksum=true` is the configured default but is not
persisted to `pg_class.reloptions`. The session on Greenplum shows the
expected, consistent behavior.
```text
cbdb@robertmu-VirtualBox:~/Projects/cloudberry$ psql
psql (14.4, server 14.4)
Type "help" for help.
cbdb=# select version();
version
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 14.4 (Apache Cloudberry 2.1.0-devel+dev.2019.g1cc76495e18 build
dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2)
9.4.0, 64-bit compiled on Jul 18 2025 11:29:40
(1 row)
cbdb=#
cbdb=# create table tab_ao(a int, b int) with(appendonly=true,
orientation=column, compresstype=zlib, blocksize=32768, compresslevel=1);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named
'a' as the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
cbdb=#
cbdb=# select oid, relname, reloptions from pg_class where relname =
'tab_ao';
oid | relname | reloptions
-------+---------+-----------------------------------------------------
20763 | tab_ao | {compresstype=zlib,blocksize=32768,compresslevel=1}
(1 row)
cbdb=#
cbdb=# show gp_default_storage_options;
gp_default_storage_options
-------------------------------------------------
blocksize=32768,compresstype=none,checksum=true
(1 row)
cbdb=#
cbdb=#gpdb7@robertmu-VirtualBox:~/Projects/gpdb-archive$ psql
psql (12.12)
Type "help" for help.
gpdb7=# select version();
version
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 12.12 (Greenplum Database 7.0.0-beta.0+482967c1b4 build dev) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0,
64-bit compiled on Nov 8 2024 23:43:47 Bhuvnesh C.
(1 row)
gpdb7=#
gpdb7=# create table tab_ao(a int, b int) with(appendonly=true,
orientation=column, compresstype=zlib, blocksize=32768, compresslevel=1);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named
'a' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make
sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
gpdb7=#
gpdb7=# select oid, relname, reloptions from pg_class where relname =
'tab_ao';
oid | relname | reloptions
-------+---------+-------------------------------------------------------------------
18293 | tab_ao |
{compresstype=zlib,blocksize=32768,compresslevel=1,checksum=true}
(1 row)
gpdb7=#
gpdb7=# show gp_default_storage_options;
gp_default_storage_options
-------------------------------------------------
blocksize=32768,compresstype=none,checksum=true
(1 row)
gpdb7=#
gpdb7=#
```
### Expected Behavior
The `reloptions` column in `pg_class` for the `tab_ao` table should contain
`checksum=true`, as this is the effective default set by
`gp_default_storage_options` at the time of creation.
### Actual Behavior
The `reloptions` column in `pg_class` for the `tab_ao` table **does not**
contain the `checksum=true` option, even though it is part of the system
default.
### Environment
- **Cloudberry Version:** `PostgreSQL 14.4 (Apache Cloudberry
2.1.0-devel+dev.2019.g1cc76495e18 build dev)`
- **Greenplum Version:** `PostgreSQL 12.12 (Greenplum Database
7.0.0-beta.0+482967c1b4 build dev)`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]