Here are some review comments for the patch v10-0002.
======
Commit Message
1.
Note that we don't copy columns when the subscriber-side column is also
generated. Those will be filled as normal with the subscriber-side computed or
default data.
~
Now this patch also introduced some errors etc, so I think that patch
comment should be written differently to explicitly spell out
behaviour of every combination, something like the below:
Summary
when (include_generated_column = true)
* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).
* publisher not-generated column => subscriber generated column: This
will give ERROR.
* publisher generated column => subscriber not-generated column: The
publisher generated column value is copied.
* publisher generated column => subscriber generated column: The
publisher generated column value is not copied. The subscriber
generated column will be filled with the subscriber-side computed or
default data.
when (include_generated_columns = false)
* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).
* publisher not-generated column => subscriber generated column: This
will give ERROR.
* publisher generated column => subscriber not-generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber column will be filled with the subscriber-side default
data.
* publisher generated column => subscriber generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber generated column will be filled with the
subscriber-side computed or default data.
======
src/backend/replication/logical/relation.c
2.
logicalrep_rel_open:
I tested some of the "missing column" logic, and got the following results:
Scenario A:
PUB
test_pub=# create table t2(a int, b int);
test_pub=# create publication pub2 for table t2;
SUB
test_sub=# create table t2(a int, b int generated always as (a*2) stored);
test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub2 with (include_generated_columns = false);
Result:
ERROR: logical replication target relation "public.t2" is missing
replicated column: "b"
~
Scenario B:
PUB/SUB identical to above, but subscription sub2 created "with
(include_generated_columns = true);"
Result:
ERROR: logical replication target relation "public.t2" has a
generated column "b" but corresponding column on source relation is
not a generated column
~~~
2a. Question
Why should we get 2 different error messages for what is essentially
the same problem according to whether the 'include_generated_columns'
is false or true? Isn't the 2nd error message the more correct and
useful one for scenarios like this involving generated columns?
Thoughts?
~
2b. Missing tests?
I also noticed there seems no TAP test for the current "missing
replicated column" message. IMO there should be a new test introduced
for this because the loop involved too much bms logic to go
untested...
======
src/backend/replication/logical/tablesync.c
make_copy_attnamelist:
NITPICK - minor comment tweak
NITPICK - add some spaces after "if" code
3.
Should you pfree the gencollist at the bottom of this function when
you no longer need it, for tidiness?
~~~
4.
static void
-fetch_remote_table_info(char *nspname, char *relname,
+fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
LogicalRepRelation *lrel, List **qual)
{
WalRcvExecResult *res;
StringInfoData cmd;
TupleTableSlot *slot;
Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
- Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
+ Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
Oid qualRow[] = {TEXTOID};
bool isnull;
+ bool *remotegenlist_res;
IMO the names 'remotegenlist' and 'remotegenlist_res' should be
swapped the other way around, because it is the function parameter
that is the "result", whereas the 'remotegenlist_res' is just the
local working var for it.
~~~
5. fetch_remote_table_info
Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
places, I think it will be better to assign this to a 'server_version'
variable to be used everywhere instead of having multiple function
calls.
~~~
6.
"SELECT a.attnum,"
" a.attname,"
" a.atttypid,"
- " a.attnum = ANY(i.indkey)"
+ " a.attnum = ANY(i.indkey),"
+ " a.attgenerated != ''"
" FROM pg_catalog.pg_attribute a"
" LEFT JOIN pg_catalog.pg_index i"
" ON (i.indexrelid = pg_get_replica_identity_index(%u))"
" WHERE a.attnum > 0::pg_catalog.int2"
- " AND NOT a.attisdropped %s"
+ " AND NOT a.attisdropped", lrel->remoteid);
+
+ if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 &&
+ walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) ||
+ !MySubscription->includegencols)
+ appendStringInfo(&cmd, " AND a.attgenerated = ''");
+
If the server version is < PG12 then AFAIK there was no such thing as
"a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''"
part also be guarded by some version checking condition like in the
WHERE? Otherwise won't it cause an ERROR for old servers?
~~~
7.
/*
- * For non-tables and tables with row filters, we need to do COPY
- * (SELECT ...), but we can't just do SELECT * because we need to not
- * copy generated columns. For tables with any row filters, build a
- * SELECT query with OR'ed row filters for COPY.
+ * For non-tables and tables with row filters and when
+ * 'include_generated_columns' is specified as 'true', we need to do
+ * COPY (SELECT ...), as normal COPY of generated column is not
+ * supported. For tables with any row filters, build a SELECT query
+ * with OR'ed row filters for COPY.
*/
NITPICK. I felt this was not quite right. AFAIK the reasons for using
this COPY (SELECT ...) syntax is different for row-filters and
generated-columns. Anyway, I updated the comment slightly in my
nitpicks attachment. Please have a look at it to see if you agree with
the suggestions. Maybe I am wrong.
~~~
8.
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(String, att_name, attnamelist)
I'm not 100% sure, but isn't foreach_node the macro to use here,
rather than foreach_ptr?
======
src/test/subscription/t/011_generated.pl
9.
Please discuss with Shubham how to make all the tab1, tab2, tab3,
tab4, tab5, tab6 comments use the same kind of style/wording.
Currently, the patches 0001 and 0002 test comments are a bit
inconsistent.
~~~
10.
Related to above -- now that patch 0002 supports copy_data=true I
don't see why we need to test generated columns *both* for
copy_data=false and also for copy_data=true. IOW, is it really
necessary to have so many tables/tests? For example, I am thinking
some of those tests from patch 0001 can be re-used or just removed now
that copy_data=true works.
~~~
NITPICK - minor comment tweak
~~~
11.
For tab4 and tab6 I saw the initial sync and normal replication data
tests are all merged together, but I had expected to see the initial
sync and normal replication data tests separated so it would be
consistent with the earlier tab1, tab2, tab3 tests.
======
99.
Also, I have attached a nitpicks diff for some of the cosmetic review
comments mentioned above. Please apply whatever of these that you
agree with.
======
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c
b/src/backend/replication/logical/tablesync.c
index b3fde6a..1bca40c 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -725,7 +725,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool
*remotegenlist)
* name as a non-generated column in the corresponding
* publication table.
*/
- if(!remotegenlist[attnum])
+ if (!remotegenlist[attnum])
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("logical replication
target relation \"%s.%s\" has a generated column \"%s\" "
@@ -742,11 +742,12 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool
*remotegenlist)
}
/*
- * Construct column list for COPY.
+ * Construct column list for COPY, excluding columns that are
+ * subscription table generated columns.
*/
for (int i = 0; i < rel->remoterel.natts; i++)
{
- if(!gencollist[i])
+ if (!gencollist[i])
attnamelist = lappend(attnamelist,
makeString(rel->remoterel.attnames[i]));
}
@@ -1231,11 +1232,14 @@ copy_table(Relation rel)
else
{
/*
- * For non-tables and tables with row filters and when
- * 'include_generated_columns' is specified as 'true', we need
to do
- * COPY (SELECT ...), as normal COPY of generated column is not
- * supported. For tables with any row filters, build a SELECT
query
- * with OR'ed row filters for COPY.
+ * For non-tables and tables with row filters, we need to do
COPY
+ * (SELECT ...), but we can't just do SELECT * because we need
to not
+ * copy generated columns. For tables with any row filters,
build a
+ * SELECT query with OR'ed row filters for COPY.
+ *
+ * We also need to use this same COPY (SELECT ...) syntax when
+ * 'include_generated_columns' is specified as true, because
copy
+ * of generated columns is not supported by the normal COPY.
*/
int i = 0;
diff --git a/src/test/subscription/t/011_generated.pl
b/src/test/subscription/t/011_generated.pl
index c47eaf5..0a3026c 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -63,8 +63,8 @@ $node_publisher->safe_psql('postgres', "CREATE TABLE tab5 (a
int, b int)");
$node_subscriber->safe_psql('postgres',
"CREATE TABLE tab5 (a int, b int GENERATED ALWAYS AS (a * 22) STORED)");
-# tab6: publisher-side generated col 'b' and 'c' --> subscriber-side
non-generated col 'b', and generated-col 'c'
-# columns on subscriber in different order
+# tab6: publisher-side generated col 'b' and 'c' --> subscriber-side
non-generated col 'b', and generated-col 'c',
+# where columns on the subscriber are in a different order
$node_publisher->safe_psql('postgres',
"CREATE TABLE tab6 (a int, b int GENERATED ALWAYS AS (a * 2) STORED, c
int GENERATED ALWAYS AS (a * 2) STORED)");