Hi Peter,

Thanks for the review and sorry for taking so long.

I've addressed most of the comments in the patch I sent a couple minutes ago. More comments in-line:


On 1/28/22 09:39, Peter Smith wrote:
Here are some review comments for the v17-0001 patch.

~~~

1. Commit message

If no column list is specified, all the columns are replicated, as
previously

Missing period (.) at the end of that sentence.


I plan to reword that anyway.

~~~

2. doc/src/sgml/catalogs.sgml

+      <para>
+       This is an array of values that indicates which table columns are
+       part of the publication.  For example a value of <literal>1 3</literal>
+       would mean that the first and the third table columns are published.
+       A null value indicates that all attributes are published.
+      </para></entry>

Missing comma:
"For example" --> "For example,"


Fixed.

Terms:
The text seems to jump between "columns" and "attributes". Perhaps,
for consistency, that last sentence should say: "A null value
indicates that all columns are published."


Yeah, but that's a pre-existing problem. I've modified the parts added by the patch to use "columns" though.

~~~

3. doc/src/sgml/protocol.sgml

  </variablelist>
-        Next, the following message part appears for each column
(except generated columns):
+        Next, the following message part appears for each column (except
+        generated columns and other columns that don't appear in the column
+        filter list, for tables that have one):
  <variablelist>

Perhaps that can be expressed more simply, like:

Next, the following message part appears for each column (except
generated columns and other columns not present in the optional column
filter list):


Not sure. I'll think about it.

~~~

4. doc/src/sgml/ref/alter_publication.sgml

+ALTER PUBLICATION <replaceable class="parameter">name</replaceable>
ALTER TABLE <replaceable
class="parameter">publication_object</replaceable> SET COLUMNS { (
<replaceable class="parameter">name</replaceable> [, ...] ) | ALL }

The syntax chart looks strange because there is already a "TABLE" and
a column_name list within the "publication_object" definition, so do
ALTER TABLE and publication_object co-exist?
According to the current documentation it suggests nonsense like below is valid:
ALTER PUBLICATION mypublication ALTER TABLE TABLE t1 (a,b,c) SET
COLUMNS (a,b,c);


Yeah, I think that's wrong. I think "publication_object" is wrong in this place, so I've used "table_name".

--

But more fundamentally, I don't see why any new syntax is even needed at all.

Instead of:
ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS
(user_id, firstname, lastname);
Why not just:
ALTER PUBLICATION mypublication ALTER TABLE users (user_id, firstname,
lastname);


I haven't modified the grammar yet, but I agree SET COLUMNS seems a bit unnecessary. It also seems a bit inconsistent with ADD TABLE which simply lists the columns right adter the table name.

Then, if the altered table defines a *different* column list then it
would be functionally equivalent to whatever your SET COLUMNS is doing
now. AFAIK this is how the Row-Filter [1] works, so that altering an
existing table to have a different Row-Filter just overwrites that
table's filter. IMO the Col-Filter behaviour should work the same as
that - "SET COLUMNS" is redundant.


I'm sorry, I don't understand what this is saying :-(

~~~

5. doc/src/sgml/ref/alter_publication.sgml

-    TABLE [ ONLY ] <replaceable
class="parameter">table_name</replaceable> [ * ] [, ... ]
+    TABLE [ ONLY ] <replaceable
class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]

That extra comma after the "column_name" seems wrong because there is
one already in "[, ... ]".


Fixed.

~~~

6. doc/src/sgml/ref/create_publication.sgml

-    TABLE [ ONLY ] <replaceable
class="parameter">table_name</replaceable> [ * ] [, ... ]
+    TABLE [ ONLY ] <replaceable
class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]

(Same as comment #5).
That extra comma after the "column_name" seems wrong because there is
one already in "[, ... ]".


Fixed.

~~~

7. doc/src/sgml/ref/create_publication.sgml

+     <para>
+      When a column list is specified, only the listed columns are replicated;
+      any other columns are ignored for the purpose of replication through
+      this publication.  If no column list is specified, all columns of the
+      table are replicated through this publication, including any columns
+      added later.  If a column list is specified, it must include the replica
+      identity columns.
+     </para>

Suggest to re-word this a bit simpler:

e.g.
- "listed columns" --> "named columns"
- I don't think it is necessary to say the unlisted columns are ignored.
- I didn't think it is necessary to say "though this publication"

AFTER
When a column list is specified, only the named columns are replicated.
If no column list is specified, all columns of the table are replicated,
including any columns added later. If a column list is specified, it must
include the replica identity columns.


Fixed, seems reasonable.

~~~

8. doc/src/sgml/ref/create_publication.sgml

Consider adding another example showing a CREATE PUBLICATION which has
a column list.


Added.

~~~

9. src/backend/catalog/pg_publication.c - check_publication_add_relation

  /*
- * Check if relation can be in given publication and throws appropriate
- * error if not.
+ * Check if relation can be in given publication and that the column
+ * filter is sensible, and throws appropriate error if not.
+ *
+ * targetcols is the bitmapset of attribute numbers given in the column list,
+ * or NULL if it was not specified.
   */

Typo: "targetcols" --> "columns" ??


Right, I noticed that too.

~~~

10. src/backend/catalog/pg_publication.c - check_publication_add_relation

+
+ /* Make sure the column list checks out */
+ if (columns != NULL)
+ {

Perhaps "checks out" could be worded better.


Right, I expanded that in my review.

~~~

11. src/backend/catalog/pg_publication.c - check_publication_add_relation

+ /* Make sure the column list checks out */
+ if (columns != NULL)
+ {
+ /*
+ * Even if the user listed all columns in the column list, we cannot
+ * allow a column list to be specified when REPLICA IDENTITY is FULL;
+ * that would cause problems if a new column is added later, because
+ * the new column would have to be included (because of being part of
+ * the replica identity) but it's technically not allowed (because of
+ * not being in the publication's column list yet).  So reject this
+ * case altogether.
+ */
+ if (replidentfull)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("invalid column list for publishing relation \"%s\"",
+    RelationGetRelationName(targetrel)),
+ errdetail("Cannot specify a column list on relations with REPLICA
IDENTITY FULL."));
+
+ check_publication_columns(pub, targetrel, columns);
+ }

IIUC almost all of the above comment and code is redundant because by
calling the check_publication_columns function it will do exactly the
same check...

So,  that entire slab might be replaced by 2 lines:

if (columns != NULL)
check_publication_columns(pub, targetrel, columns);


You're right. But I think we can make that even simpler by moving even the (columns!=NULL) check into the function.

~~~

12. src/backend/catalog/pg_publication.c - publication_set_table_columns

+publication_set_table_columns(Relation pubrel, HeapTuple pubreltup,
+   Relation targetrel, List *columns)
+{
+ Bitmapset  *attset;
+ AttrNumber *attarray;
+ HeapTuple copytup;
+ int natts;
+ bool nulls[Natts_pg_publication_rel];
+ bool replaces[Natts_pg_publication_rel];
+ Datum values[Natts_pg_publication_rel];
+
+ memset(values, 0, sizeof(values));
+ memset(nulls, 0, sizeof(nulls));
+ memset(replaces, false, sizeof(replaces));

It seemed curious to use memset false for "replaces" but memset 0 for
"nulls", since they are both bool arrays (??)


Fixed.

~~~

13. src/backend/catalog/pg_publication.c - compare_int16

+/* qsort comparator for attnums */
+static int
+compare_int16(const void *a, const void *b)
+{
+ int av = *(const int16 *) a;
+ int bv = *(const int16 *) b;
+
+ /* this can't overflow if int is wider than int16 */
+ return (av - bv);
+}

This comparator seems common with another one already in the PG
source. Perhaps it would be better for generic comparators (like this
one) to be in some common code instead of scattered cut/paste copies
of the same thing.


I thought about it, but it doesn't really seem worth the effort.

~~~

14. src/backend/commands/publicationcmds.c - AlterPublicationTables

+ else if (stmt->action == AP_SetColumns)
+ {
+ Assert(schemaidlist == NIL);
+ Assert(list_length(tables) == 1);
+
+ PublicationSetColumns(stmt, pubform,
+   linitial_node(PublicationTable, tables));
+ }

(Same as my earlier review comment #4)

Suggest to call this PublicationSetColumns based on some smarter
detection logic of a changed column list. Please refer to the
Row-Filter patch [1] for this same function.


I don't understand. Comment #4 is about syntax, no?

~~~

15. src/backend/commands/publicationcmds.c - AlterPublicationTables

+ /* This is not needed to delete a table */
+ pubrel->columns = NIL;

Perhaps a more explanatory comment would be better there?


If I understand the comment, it says we don't actually need to set columns to NIL. In which case we can just get rid of the change.

~~~

16. src/backend/commands/tablecmds.c - relation_mark_replica_identity

@@ -15841,6 +15871,7 @@ relation_mark_replica_identity(Relation rel,
char ri_type, Oid indexOid,
   CatalogTupleUpdate(pg_index, &pg_index_tuple->t_self, pg_index_tuple);
   InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
   InvalidOid, is_internal);
+
   /*
   * Invalidate the relcache for the table, so that after we commit
   * all sessions will refresh the table's replica identity index

Spurious whitespace change seemed unrelated to the Col-Filter patch.


Fixed.

~~~

17. src/backend/parser/gram.y

   *
+ * ALTER PUBLICATION name SET COLUMNS table_name (column[, ...])
+ * ALTER PUBLICATION name SET COLUMNS table_name ALL
+ *

(Same as my earlier review comment #4)

IMO there was no need for the new syntax of SET COLUMNS.


Not modified yet, we'll see about the syntax.

~~~

18. src/backend/replication/logical/proto.c - logicalrep_write_attrs

- /* send number of live attributes */
- for (i = 0; i < desc->natts; i++)
- {
- if (TupleDescAttr(desc, i)->attisdropped || TupleDescAttr(desc,
i)->attgenerated)
- continue;
- nliveatts++;
- }
- pq_sendint16(out, nliveatts);
-
   /* fetch bitmap of REPLICATION IDENTITY attributes */
   replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
   if (!replidentfull)
   idattrs = RelationGetIdentityKeyBitmap(rel);

+ /* send number of live attributes */
+ for (i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ if (att->attisdropped || att->attgenerated)
+ continue;
+ if (columns != NULL && !bms_is_member(att->attnum, columns))
+ continue;
+ nliveatts++;
+ }
+ pq_sendint16(out, nliveatts);
+

This change seemed to have the effect of moving that 4 lines of
"replidentfull" code from below the loop to above the loop. But moving
that code seems unrelated to the Col-Filter patch. (??).


Right, restored the original code.

~~~

19. src/backend/replication/logical/tablesync.c - fetch_remote_table_info

@@ -793,12 +877,12 @@ fetch_remote_table_info(char *nspname, char *relname,

   ExecClearTuple(slot);
   }
+
   ExecDropSingleTupleTableSlot(slot);
-
- lrel->natts = natt;
-
   walrcv_clear_result(res);
   pfree(cmd.data);
+
+ lrel->natts = natt;
  }

The shuffling of those few lines seems unrelated to any requirement of
the Col-Filter patch (??)


Yep, undone. I'd bet this is simply due to older versions of the patch touching this place, and then undoing some of it.

~~~

20. src/backend/replication/logical/tablesync.c - copy_table

+ for (int i = 0; i < lrel.natts; i++)
+ {
+ appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
+ if (i < lrel.natts - 1)
+ appendStringInfoString(&cmd, ", ");
+ }

Perhaps that could be expressed more simply if the other way around like:

for (int i = 0; i < lrel.natts; i++)
{
if (i)
appendStringInfoString(&cmd, ", ");
appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
}


I used a slightly different version.

~~~

21. src/backend/replication/pgoutput/pgoutput.c

+
+ /*
+ * Set of columns included in the publication, or NULL if all columns are
+ * included implicitly.  Note that the attnums in this list are not
+ * shifted by FirstLowInvalidHeapAttributeNumber.
+ */
+ Bitmapset  *columns;

Typo: "in this list" --> "in this set" (??)


"bitmap" is what we call Bitmapset so I used that.

~~~

22. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry

   * Don't publish changes for partitioned tables, because
- * publishing those of its partitions suffices, unless partition
- * changes won't be published due to pubviaroot being set.
+ * publishing those of its partitions suffices.  (However, ignore
+ * this if partition changes are not to published due to
+ * pubviaroot being set.)
   */

This change seems unrelated to the Col-Filter patch, so perhaps it
should not be here at all.

Also, typo: "are not to published"


Yeah, unrelated. Reverted.

~~~

23. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry

+ /*
+ * Obtain columns published by this publication, and add them
+ * to the list for this rel.  Note that if at least one
+ * publication has a empty column list, that means to publish
+ * everything; so if we saw a publication that includes all
+ * columns, skip this.
+ */

Typo: "a empty" --> "an empty"


Fixed.

~~~

24. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry

+ if (isnull)
+ {
+ /*
+ * If we see a publication with no columns, reset the
+ * list and ignore further ones.
+ */

Perhaps that comment is meant to say "with no column filter" instead
of "with no columns"?


Yep, fixed.

~~~

25. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry

+ if (isnull)
+ {
...
+ }
+ else if (!isnull)
+ {
...
+ }

Is the "if (!isnull)" in the else just to be really REALLY sure it is not null?


Double-tap ;-) Removed the condition.

~~~

26. src/bin/pg_dump/pg_dump.c - getPublicationTables

+ pubrinfo[i].pubrattrs = attribs->data;
+ }
+ else
+ pubrinfo[j].pubrattrs = NULL;

I got confused reading this code. Are those different indices 'i' and
'j' correct?


Good catch! I think you're right and it should be "j" in both places. This'd only cause trouble in selective pg_dumps (when dumping selected tables). The patch clearly needs some pg_dump tests.

~~~

27. src/bin/psql/describe.c

The Row-Filter [1] displays filter information not only for the psql
\dRp+ command but also for the psql \d <tablename> command. Perhaps
the Col-Filter patch should do that too.


Not sure.

~~~

28. src/bin/psql/tab-complete.c

@@ -1657,6 +1657,8 @@ psql_completion(const char *text, int start, int end)
   /* ALTER PUBLICATION <name> ADD */
   else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD"))
   COMPLETE_WITH("ALL TABLES IN SCHEMA", "TABLE");
+ else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD", "TABLE"))
+ COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
   /* ALTER PUBLICATION <name> DROP */

I am not sure about this one- is that change even related to the
Col-Filter patch or is this some unrelated bugfix?


Yeah, seems unrelated - possibly from a rebase or something. Removed.

~~~

29. src/include/catalog/pg_publication.h

@@ -86,6 +86,7 @@ typedef struct Publication
  typedef struct PublicationRelInfo
  {
   Relation relation;
+ List    *columns;
  } PublicationRelInfo;

Perhaps that needs some comment. e.g. do you need to mention that a
NIL List means all columns?


I added a short comment.

~~~

30. src/include/nodes/parsenodes.h

@@ -3642,6 +3642,7 @@ typedef struct PublicationTable
  {
   NodeTag type;
   RangeVar   *relation; /* relation to be published */
+ List    *columns; /* List of columns in a publication table */
  } PublicationTable;


That comment "List of columns in a publication table" doesn't really
say anything helpful.

Perhaps it should mention that a NIL List means all table columns?


Not sure, seems fine.

~~~

31. src/test/regress/sql/publication.sql

The regression test file has an uncommon mixture of /* */ and -- style comments.

Perhaps change all the /* */ ones?


Yeah, that needs some cleanup. I haven't done anything about it yet.

~~~

32. src/test/regress/sql/publication.sql

+CREATE TABLE testpub_tbl5 (a int PRIMARY KEY, b text, c text,
+ d int generated always as (a + length(b)) stored);
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, x);  -- error
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (b, c);  -- error
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);  -- error
+ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c);  -- ok

For all these tests (and more) there seems not sufficient explanation
comments to say exactly what each test case is testing, e.g. *why* is
an "error" expected for some cases but "ok" for others.


Not sure. I think the error is generally obvious in the expected output.

~~~

33. src/test/regress/sql/publication.sql

"-- no dice"

(??) confusing comment.


Same as for the errors.

~~~

34. src/test/subscription/t/028_column_list.pl

I think a few more comments in this TAP file would help to make the
purpose of the tests more clear.


Yeah, the 0004 patch I shared a couple minutes ago does exactly that.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to