Re: Pgoutput not capturing the generated columns
On Tue, Jul 2, 2024 at 10:59 AM Peter Smith wrote: > > Hi Shubham, > > As you can see, most of my recent review comments for patch 0001 are > only cosmetic nitpicks. But, there is still one long-unanswered design > question from a month ago [1, #G.2] > > A lot of the patch code of pgoutput.c and proto.c and logicalproto.h > is related to the introduction and passing everywhere of new > 'include_generated_columns' function parameters. These same functions > are also always passing "BitMapSet *columns" representing the > publication column list. > > My question was about whether we can't make use of the existing BMS > parameter instead of introducing all the new API parameters. > > The idea might go something like this: > > * If 'include_generated_columns' option is specified true and if no > column list was already specified then perhaps the relentry->columns > can be used for a "dummy" column list that has everything including > all the generated columns. > > * By doing this: > -- you may be able to avoid passing the extra > 'include_gernated_columns' everywhere > -- you may be able to avoid checking for generated columns deeper in > the code (since it is already checked up-front when building the > column list BMS) > > ~~ > > I'm not saying this design idea is guaranteed to work, but it might be > worth considering, because if it does work then there is potential to > make the current 0001 patch significantly shorter. > > == > [1] > https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com I have fixed this issue in the latest Patches. Please refer to the updated v15 Patches here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8Rj%2B%3Dhn--ALJQvzzu7meX3LuO3tJKppDS7eO1BGvNFYBAbg%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
Here are my review comments for v14-0002. == src/backend/replication/logical/tablesync.c make_copy_attnamelist: nitpick - remove excessive parentheses in palloc0 call. nitpick - Code is fine AFAICT except it's not immediately obvious localgenlist is indexed by the *remote* attribute number. So I renamed 'attrnum' variable in my nitpicks diff. OTOH, if you think no change is necessary, that is OK to (in that case maybe add a comment). ~~~ 1. fetch_remote_table_info + if ((server_version >= 12 && server_version <= 16) || + !MySubscription->includegencols) + appendStringInfo(, " AND a.attgenerated = ''"); Should this say < 18 instead of <= 16? ~~~ copy_table: nitpick - uppercase in comment nitpick - missing space after "if" ~~~ 2. copy_table + attnamelist = make_copy_attnamelist(relmapentry, remotegenlist); + /* Start copy on the publisher. */ initStringInfo(); - /* Regular table with no row filter */ - if (lrel.relkind == RELKIND_RELATION && qual == NIL) + /* check if remote column list has generated columns */ + if(MySubscription->includegencols) + { + for (int i = 0; i < relmapentry->remoterel.natts; i++) + { + if(remotegenlist[i]) + { + remote_has_gencol = true; + break; + } + } + } + There is some subtle logic going on here: For example, the comment here says "Check if the remote column list has generated columns", and it then proceeds to iterate the remote attributes checking the remotegenlist[i]. But the remotegenlist[] was returned from a prior call to make_copy_attnamelist() and according to the make_copy_attnamelist logic, it is NOT returning all remote generated-cols in that list. Specifically, it is stripping some of them -- "Do not include generated columns of the subscription table in the [remotegenlist] column list.". So, actually this loop seems to be only finding cases (setting remote_has_gen = true) where the remote column is generated but the match local column is *not* generated. Maybe this was the intended logic all along but then certainly the comment should be improved to describe it better. ~~~ 3. + /* + * Regular table with no row filter and 'include_generated_columns' + * specified as 'false' during creation of subscription. + */ + if (lrel.relkind == RELKIND_RELATION && qual == NIL && !remote_has_gencol) nitpick - This comment also needs improving. For example, just because remote_has_gencol is false, it does not follow that 'include_generated_columns' was specified as 'false' -- maybe the parameter was 'true' but the table just had no generated columns anyway... I've modified the comment already in my nitpicks diff, but probably you can improve on that. ~ nitpick - "else" comment is modified slightly too. Please see the nitpicks diff. ~ 4. In hindsight, I felt your variable 'remote_has_gencol' was not well-named because it is not for saying the remote table has a generated column -- it is saying the remote table has a generated column **that we have to copy**. So, rather it should be named something like 'gencol_copy_needed' (but I didn't change this name in the nitpick diffs...) == src/test/subscription/t/004_sync.pl nitpick - changes to comment style to make the test case separations much more obvious nitpick - minor comment wording tweaks 5. Here, you are confirming we get an ERROR when replicating from a non-generated column to a generated column. But I think your patch also added exactly that same test scenario in the 011_generated (as the sub5 test). So, maybe this one here should be removed? == src/test/subscription/t/011_generated.pl nitpick - comment wrapping at 80 chars nitpick - add/remove blank lines for readability nitpick - typo /subsriber/subscriber/ nitpick - prior to the ALTER test, tab6 is unsubscribed. So add another test to verify its initial data nitpick - sometimes the msg 'add a new table to existing publication' is misplaced nitpick - the tests for tab6 and tab5 were in opposite to the expected order, so swapped them. == 99. Please see also the attached diff which implements all the nitpicks described in this post. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 38f3621..c264353 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -704,30 +704,30 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *remotegenlist) TupleDesc desc; desc = RelationGetDescr(rel->localrel); - localgenlist = palloc0((rel->remoterel.natts * sizeof(bool))); + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool)); /* * This loop checks for generated columns on subscription table. */ for (int i = 0; i < desc->natts; i++) { - int attnum; + int remote_attnum; Form_pg_attribute
Re: Pgoutput not capturing the generated columns
On Wed, 26 Jun 2024 at 08:06, Peter Smith wrote: > > Hi Shlok. Here are my review comments for patch v10-0003 > > == > General. > > 1. > The patch has lots of conditions like: > if (att->attgenerated && (att->attgenerated != > ATTRIBUTE_GENERATED_STORED || !include_generated_columns)) > continue; > > IMO these are hard to read. Although more verbose, please consider if > all those (for the sake of readability) would be better re-written > like below : > > if (att->generated) > { > if (!include_generated_columns) > continue; > > if (att->attgenerated != ATTRIBUTE_GENERATED_STORED) > continue; > } Fixed > == > contrib/test_decoding/test_decoding.c > > tuple_to_stringinfo: > > NITPICK = refactored the code and comments a bit here to make it easier > NITPICK - there is no need to mention "virtual". Instead, say we only > support STORED Fixed > == > src/backend/catalog/pg_publication.c > > publication_translate_columns: > > NITPICK - introduced variable 'att' to simplify this code Fixed > ~ > > 2. > + ereport(ERROR, > + errcode(ERRCODE_INVALID_COLUMN_REFERENCE), > + errmsg("cannot use virtual generated column \"%s\" in publication > column list", > +colname)); > > Is it better to avoid referring to "virtual" at all? Instead, consider > rearranging the wording to say something like "generated column \"%s\" > is not STORED so cannot be used in a publication column list" Fixed > ~~~ > > pg_get_publication_tables: > > NITPICK - split the condition code for readability Fixed > == > src/backend/replication/logical/relation.c > > 3. logicalrep_rel_open > > + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED) > + continue; > + > > Isn't this missing some code to say "entry->attrmap->attnums[i] = > -1;", same as all the other nearby code is doing? Fixed > ~~~ > > 4. > I felt all the "generated column" logic should be kept together, so > this new condition should be combined with the other generated column > condition, like: > > if (attr->attgenerated) > { > /* comment... */ > if (!MySubscription->includegencols) > { > entry->attrmap->attnums[i] = -1; > continue; > } > > /* comment... */ > if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED) > { > entry->attrmap->attnums[i] = -1; > continue; > } > } Fixed > == > src/backend/replication/logical/tablesync.c > > 5. > + if (gencols_allowed) > + { > + /* Replication of generated cols is supported, but not VIRTUAL cols. */ > + appendStringInfo(, " AND a.attgenerated != 'v'"); > + } > > Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead > of the hardwired 'v'? (Maybe add another TODO comment to revisit > this). > > Alternatively, consider if it is more future-proof to rearrange so it > just says what *is* supported instead of what *isn't* supported: > e.g. "AND a.attgenerated IN ('', 's')" I feel we should use ATTRIBUTE_GENERATED_VIRTUAL macro. Added a TODO. > == > src/test/subscription/t/011_generated.pl > > NITPICK - some comments are missing the word "stored" > NITPICK - sometimes "col" should be plural "cols" > NITPICK = typo "GNERATED" Add the relevant changes. > == > > 6. > In a previous review [1, comment #3] I mentioned that there should be > some docs updates on the "Logical Replication Message Formats" section > 53.9. So, I expected patch 0001 would make some changes and then patch > 0003 would have to update it again to say something about "STORED". > But all that is missing from the v10* patches. > > == Will fix in upcoming version > > 99. > See also my nitpicks diff which is a top-up patch addressing all the > nitpick comments mentioned above. Please apply all of these that you > agree with. Applied Relevant changes Please refer v14 patch for the changes [1]. [1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Tue, 25 Jun 2024 at 18:49, Hayato Kuroda (Fujitsu) wrote: > > Dear Shlok, > > Thanks for updating patches! Below are my comments, maybe only for 0002. > > 01. General > > IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET > include_generated_columns > is prohibit. Previously, it seems okay because there are exclusive options. > But now, > such restrictions are gone. Do you have a reason in your mind? It is just not > considered > yet? We donot support ALTER SUBSCRIPTION to alter 'include_generated_columns'. Suppose initially the user has a logical replication setup. Publisher has table t1 with columns (c1 int, c2 int generated always as (c1*2)) and subscriber has table t1 with columns (c1 int, c2 int). And initially 'incude_generated_column' is true. Now if we 'ALTER SUBSCRIPTION' to set 'include_generated_columns' as false. Initial rows will have data for c2 on the subscriber table, but will not have value after alter. This may be an inconsistent behaviour. > 02. General > > According to the doc, we allow to alter a column to non-generated one, by > ALTER > TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be > when the command is executed on the subscriber while copying the data? Should > we continue the copy or restart? How do you think? COPY of data will happen in a single transaction, so if we execute 'ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION' command, It will take place after the whole COPY command will finish. So I think it will not create any issue. > 03. Tes tcode > > IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add > a test for that? Added > 04. Test code (maybe for 0001) > > Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION > command. Added > 05. logicalrep_rel_open > > ``` > +/* > + * In case 'include_generated_columns' is 'false', we should > skip the > + * check of missing attrs for generated columns. > + * In case 'include_generated_columns' is 'true', we should > check if > + * corresponding column for the generated column in publication > column > + * list is present in the subscription table. > + */ > +if (!MySubscription->includegencols && attr->attgenerated) > +{ > +entry->attrmap->attnums[i] = -1; > +continue; > +} > ``` > > This comment is not very clear to me, because here we do not skip anything. > Can you clarify the reason why attnums[i] is set to -1 and how will it be > used? This part of the code is removed to address some comments. > 06. make_copy_attnamelist > > ``` > +gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool)); > ``` > > I think this array is too large. Can we reduce a size to (desc->natts * > sizeof(bool))? > Also, the free'ing should be done. I have changed the name 'gencollist' to 'localgenlist' to make the name more consistent. Also size should be (rel->remoterel.natts * sizeof(bool)) as I am storing if a column is generated like 'localgenlist[attnum] = true;' where 'attnum' is corresponding attribute number on publisher side. > 07. make_copy_attnamelist > > ``` > +/* Loop to handle subscription table generated columns. */ > +for (int i = 0; i < desc->natts; i++) > ``` > > IIUC, the loop is needed to find generated columns on the subscriber side, > right? > Can you clarify as comment? Fixed > 08. copy_table > > ``` > +/* > + * Regular table with no row filter and 'include_generated_columns' > + * specified as 'false' during creation of subscription. > + */ > ``` > > I think this comment is not correct. After patching, all tablesync command > becomes > like COPY (SELECT ...) if include_genereted_columns is set to true. Is it > right? > Can we restrict only when the table has generated ones? Fixed Please refer to v14 patch for the changes [1]. [1]: https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
Hi Shubham, As you can see, most of my recent review comments for patch 0001 are only cosmetic nitpicks. But, there is still one long-unanswered design question from a month ago [1, #G.2] A lot of the patch code of pgoutput.c and proto.c and logicalproto.h is related to the introduction and passing everywhere of new 'include_generated_columns' function parameters. These same functions are also always passing "BitMapSet *columns" representing the publication column list. My question was about whether we can't make use of the existing BMS parameter instead of introducing all the new API parameters. The idea might go something like this: * If 'include_generated_columns' option is specified true and if no column list was already specified then perhaps the relentry->columns can be used for a "dummy" column list that has everything including all the generated columns. * By doing this: -- you may be able to avoid passing the extra 'include_gernated_columns' everywhere -- you may be able to avoid checking for generated columns deeper in the code (since it is already checked up-front when building the column list BMS) ~~ I'm not saying this design idea is guaranteed to work, but it might be worth considering, because if it does work then there is potential to make the current 0001 patch significantly shorter. == [1] https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna wrote: > >... > > 8. > > + else if (strcmp(elem->defname, "include-generated-columns") == 0) > > + { > > + if (elem->arg == NULL) > > + data->include_generated_columns = true; > > > > Is there any way to test that "elem->arg == NULL" in the > > generated.sql? OTOH, if it is not possible to get here then is the > > code even needed? > > > > Currently I could not find a case where the > 'include_generated_columns' option is not specifying any value, but I > was hesitant to remove this from here as the other options mentioned > follow the same rules. Thoughts? > If you do manage to find a scenario for this then I think a test for it would be good. But, I agree that the code seems OK because now I see it is the same pattern as similar nearby code. ~~~ Thanks for the updated patch. Here are some review comments for patch v13-0001. == .../expected/generated_columns.out nitpicks (see generated_columns.sql) == .../test_decoding/sql/generated_columns.sql nitpick - use plural /column/columns/ nitpick - use consistent wording in the comments nitpick - IMO it is better to INSERT different values for each of the tests == doc/src/sgml/protocol.sgml nitpick - I noticed that none of the other boolean parameters on this page mention about a default, so maybe here we should do the same and omit that information. ~~~ 1. - - Next, the following message part appears for each column included in - the publication (except generated columns): - - In a previous review [1 comment #11] I wrote that you can't just remove this paragraph because AFAIK it is still meaningful. A minimal change might be to just remove the "(except generated columns)" part. Alternatively, you could give a more detailed explanation mentioning the include_generated_columns protocol parameter. I provided some updated text for this paragraph in my NITPICKS top-up patch, Please have a look at that for ideas. == src/backend/commands/subscriptioncmds.c It looks like pg_indent needs to be run on this file. == src/include/catalog/pg_subscription.h nitpick - comment /publish/Publish/ for consistency == src/include/replication/walreceiver.h nitpick - comment /publish/Publish/ for consistency == src/test/regress/expected/subscription.out nitpicks - (see subscription.sql) == src/test/regress/sql/subscription.sql nitpick - combine the invalid option combinations test with all the others (no special comment needed) nitpick - rename subscription as 'regress_testsub2' same as all its peers. == src/test/subscription/t/011_generated.pl nitpick - add/remove blank lines == src/test/subscription/t/031_column_list.pl nitpick - rewording for a comment. This issue was not strictly caused by this patch, but since you are modifying the same comment we can fix this in passing. == 99. Please also see the attached top-up patch for all those nitpicks identified above. == [1] v11-0001 review https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/contrib/test_decoding/expected/generated_columns.out b/contrib/test_decoding/expected/generated_columns.out index 4c3d6dd..f3b26aa 100644 --- a/contrib/test_decoding/expected/generated_columns.out +++ b/contrib/test_decoding/expected/generated_columns.out @@ -1,4 +1,4 @@ --- test decoding of generated column +-- test decoding of generated columns SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); ?column? -- @@ -7,7 +7,7 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d -- column b' is a generated column CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED); --- when 'include-generated-columns' is not set the generated column 'b' will be replicated +-- when 'include-generated-columns' is not set the generated column 'b' values will be replicated INSERT INTO gencoltable (a) VALUES (1), (2), (3); SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); data @@ -20,26 +20,26 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc (5 rows) -- when 'include-generated-columns' = '1' the generated column 'b' values will be replicated -INSERT INTO gencoltable (a) VALUES (1), (2), (3); +INSERT INTO gencoltable (a) VALUES (4), (5), (6); SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); -data -- + data
Re: Pgoutput not capturing the generated columns
Hi, here are some patch v11-0001 comments. (BTW, I had difficulty reviewing this because something seemed strange with the changes this patch made to the test_decoding tests). == General 1. Patch name Patch name does not need to quote 'logical replication' ~ 2. test_decoding tests Multiple test_decoding tests were failing for me. There is something very suspicious about the unexplained changes the patch made to the expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all those changes in my nitpicks top-up to get everything working. Please re-confirm that all the test_decoding tests are OK! == Commit Message 3. Since you are including the example usage for test_decoding, I think it's better to include the example usage of CREATE SUBSCRIPTION also. == contrib/test_decoding/expected/binary.out 4. SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_decoding'); - ?column? --- - init -(1 row) - +ERROR: replication slot "regression_slot" already exists Huh? Why is this unrelated expected output changed by this patch? The test_decoding test fails for me unless I REVERT this change!! See my nitpicks diff. == .../expected/decoding_into_rel.out 5. -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot'); - ?column? --- - stop -(1 row) - Huh? Why is this unrelated expected output changed by this patch? The test_decoding test fails for me unless I REVERT this change!! See my nitpicks diff. == .../test_decoding/sql/decoding_into_rel.sql 6. -SELECT 'stop' FROM pg_drop_replication_slot('regression_slot'); +SELECT 'stop' FROM pg_drop_replication_slot('regression_slot'); Huh, Why does this patch change this code at all? I REVERTED this change. See my nitpicks diff. == .../test_decoding/sql/generated_columns.sql (see my nitpicks replacement file for this test) 7. +-- test that we can insert the result of a 'include_generated_columns' +-- into the tables created. That's really not a good idea in practical terms, +-- but provides a nice test. NITPICK - I didn't understand the point of this comment. I updated the comment according to my understanding. ~ NITPICK - The comment "when 'include-generated-columns' is not set then columns will not be replicated" is the opposite of what the result is. I changed this comment. NITPICK - modified and unified wording of some of the other comments NITPICK - changed some blank lines == contrib/test_decoding/test_decoding.c 8. + else if (strcmp(elem->defname, "include-generated-columns") == 0) + { + if (elem->arg == NULL) + data->include_generated_columns = true; Is there any way to test that "elem->arg == NULL" in the generated.sql? OTOH, if it is not possible to get here then is the code even needed? == doc/src/sgml/ddl.sgml 9. - Generated columns are skipped for logical replication and cannot be - specified in a CREATE PUBLICATION column list. + 'include_generated_columns' option controls whether generated columns + should be included in the string representation of tuples during + logical decoding in PostgreSQL. The default is true. NITPICK - Use proper markdown instead of single quotes for the parameter. NITPICK - I think this can be reworded slightly to provide a cross-reference to the CREATE SUBSCRIPTION parameter for more details (which means then we can avoid repeating details like the default value here). PSA my nitpicks diff for an example of how I thought docs should look. == doc/src/sgml/protocol.sgml 10. +The default is true. No, it isn't. AFAIK you made the default behaviour true only for 'test_decoding', but the default for CREATE SUBSCRIPTION remains *false* because that is the existing PG17 behaviour. And the default for the 'include_generated_columns' in the protocol is *also* false to match the CREATE SUBSCRIPTION default. e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'" when options->proto.logical.include_generated_columns e.g. worker.c says: options->proto.logical.include_generated_columns = MySubscription->includegencols; e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false; (This confirmed my previous review expectation that using different default behaviours for test_decoding and pgoutput would surely lead to confusion) ~~~ 11. - - Next, the following message part appears for each column included in - the publication (except generated columns): - - AFAIK you cannot just remove this entire paragraph because I thought it was still relevant to talking about "... the following message part". But, if you don't want to explain and cross-reference about 'include_generated_columns' then maybe it is OK just to remove the "(except generated columns)" part? == src/test/subscription/t/011_generated.pl NITPICK - comment typo /tab2/tab3/ NITPICK - remove some blank lines ~~~ 12. # the column was NOT replicated (the
Re: Pgoutput not capturing the generated columns
On Mon, Jun 24, 2024 at 8:21 AM Peter Smith wrote: > > Hi, here are some patch v9-0001 comments. > > I saw Kuroda-san has already posted comments for this patch so there > may be some duplication here. > > == > GENERAL > > 1. > The later patches 0002 etc are checking to support only STORED > gencols. But, doesn't that restriction belong in this patch 0001 so > VIRTUAL columns are not decoded etc in the first place... (??) > > ~~~ > > 2. > The "Generated Columns" docs mentioned in my previous review comment > [2] should be modified by this 0001 patch. > > ~~~ > > 3. > I think the "Message Format" page mentioned in my previous review > comment [3] should be modified by this 0001 patch. > > == > Commit message > > 4. > The patch name is still broken as previously mentioned [1, #1] > > == > doc/src/sgml/protocol.sgml > > 5. > Should this docs be referring to STORED generated columns, instead of > just generated columns? > > == > doc/src/sgml/ref/create_subscription.sgml > > 6. > Should this be docs referring to STORED generated columns, instead of > just generated columns? > > == > src/bin/pg_dump/pg_dump.c > > getSubscriptions: > NITPICK - tabs > NITPICK - patch removed a blank line it should not be touching > NITPICK = patch altered indents it should not be touching > NITPICK - a missing blank line that was previously present > > 7. > + else > + appendPQExpBufferStr(query, > + " false AS subincludegencols,\n"); > > There is an unwanted comma here. > > ~ > > dumpSubscription > NITPICK - patch altered indents it should not be touching > > == > src/bin/pg_dump/pg_dump.h > > NITPICK - unnecessary blank line > > == > src/bin/psql/describe.c > > describeSubscriptions > NITPICK - bad indentation > > 8. > In my previous review [1, #4b] I suggested this new column should be > in a different order (e.g. adjacent to the other ones ahead of > 'Conninfo'), but this is not yet addressed. > > == > src/test/subscription/t/011_generated.pl > > NITPICK - missing space in comment > NITPICK - misleading "because" wording in the comment > > == > > 99. > See also my attached nitpicks diff, for cosmetic issues. Please apply > whatever you agree with. > > == > [1] My v8-0001 review - > https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com > [3] > https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com All the comments are handled. I have attached the updated patch v11 here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjJpS_XDkR6OrsmMZtCBZNxeYoCdENhC0%3Dbe0rLmNvhiQw%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
Hi Shlok. Here are my review comments for patch v10-0003 == General. 1. The patch has lots of conditions like: if (att->attgenerated && (att->attgenerated != ATTRIBUTE_GENERATED_STORED || !include_generated_columns)) continue; IMO these are hard to read. Although more verbose, please consider if all those (for the sake of readability) would be better re-written like below : if (att->generated) { if (!include_generated_columns) continue; if (att->attgenerated != ATTRIBUTE_GENERATED_STORED) continue; } == contrib/test_decoding/test_decoding.c tuple_to_stringinfo: NITPICK = refactored the code and comments a bit here to make it easier NITPICK - there is no need to mention "virtual". Instead, say we only support STORED == src/backend/catalog/pg_publication.c publication_translate_columns: NITPICK - introduced variable 'att' to simplify this code ~ 2. + ereport(ERROR, + errcode(ERRCODE_INVALID_COLUMN_REFERENCE), + errmsg("cannot use virtual generated column \"%s\" in publication column list", +colname)); Is it better to avoid referring to "virtual" at all? Instead, consider rearranging the wording to say something like "generated column \"%s\" is not STORED so cannot be used in a publication column list" ~~~ pg_get_publication_tables: NITPICK - split the condition code for readability == src/backend/replication/logical/relation.c 3. logicalrep_rel_open + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED) + continue; + Isn't this missing some code to say "entry->attrmap->attnums[i] = -1;", same as all the other nearby code is doing? ~~~ 4. I felt all the "generated column" logic should be kept together, so this new condition should be combined with the other generated column condition, like: if (attr->attgenerated) { /* comment... */ if (!MySubscription->includegencols) { entry->attrmap->attnums[i] = -1; continue; } /* comment... */ if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED) { entry->attrmap->attnums[i] = -1; continue; } } == src/backend/replication/logical/tablesync.c 5. + if (gencols_allowed) + { + /* Replication of generated cols is supported, but not VIRTUAL cols. */ + appendStringInfo(, " AND a.attgenerated != 'v'"); + } Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead of the hardwired 'v'? (Maybe add another TODO comment to revisit this). Alternatively, consider if it is more future-proof to rearrange so it just says what *is* supported instead of what *isn't* supported: e.g. "AND a.attgenerated IN ('', 's')" == src/test/subscription/t/011_generated.pl NITPICK - some comments are missing the word "stored" NITPICK - sometimes "col" should be plural "cols" NITPICK = typo "GNERATED" == 6. In a previous review [1, comment #3] I mentioned that there should be some docs updates on the "Logical Replication Message Formats" section 53.9. So, I expected patch 0001 would make some changes and then patch 0003 would have to update it again to say something about "STORED". But all that is missing from the v10* patches. == 99. See also my nitpicks diff which is a top-up patch addressing all the nitpick comments mentioned above. Please apply all of these that you agree with. == [1] https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal wrote: > > On Fri, 21 Jun 2024 at 09:03, Peter Smith wrote: > > > > Hi, here are some review comments for patch v9-0002. > > > > == > > src/backend/replication/logical/relation.c > > > > 1. logicalrep_rel_open > > > > - if (attr->attisdropped) > > + if (attr->attisdropped || > > + (!MySubscription->includegencols && attr->attgenerated)) > > > > You replied to my question from the previous review [1, #2] as follows: > > In case 'include_generated_columns' is 'true'. column list in > > remoterel will have an entry for generated columns. So, in this case > > if we skip every attr->attgenerated, we will get a missing column > > error. > > > > ~ > > > > TBH, the reason seems very subtle to me. Perhaps that > > "(!MySubscription->includegencols && attr->attgenerated))" condition > > should be coded as a separate "if", so then you can include a comment > > similar to your answer, to explain it. > Fixed > > > == > > src/backend/replication/logical/tablesync.c > > > > make_copy_attnamelist: > > > > NITPICK - punctuation in function comment > > NITPICK - add/reword some more comments > > NITPICK - rearrange comments to be closer to the code they are commenting > Applied the changes > > > ~ > > > > 2. make_copy_attnamelist. > > > > + /* > > + * Construct column list for COPY. > > + */ > > + for (int i = 0; i < rel->remoterel.natts; i++) > > + { > > + if(!gencollist[i]) > > + attnamelist = lappend(attnamelist, > > + makeString(rel->remoterel.attnames[i])); > > + } > > >
RE: Pgoutput not capturing the generated columns
Dear Shlok, Thanks for updating patches! Below are my comments, maybe only for 0002. 01. General IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET include_generated_columns is prohibit. Previously, it seems okay because there are exclusive options. But now, such restrictions are gone. Do you have a reason in your mind? It is just not considered yet? 02. General According to the doc, we allow to alter a column to non-generated one, by ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be when the command is executed on the subscriber while copying the data? Should we continue the copy or restart? How do you think? 03. Tes tcode IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add a test for that? 04. Test code (maybe for 0001) Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION command. 05. logicalrep_rel_open ``` +/* + * In case 'include_generated_columns' is 'false', we should skip the + * check of missing attrs for generated columns. + * In case 'include_generated_columns' is 'true', we should check if + * corresponding column for the generated column in publication column + * list is present in the subscription table. + */ +if (!MySubscription->includegencols && attr->attgenerated) +{ +entry->attrmap->attnums[i] = -1; +continue; +} ``` This comment is not very clear to me, because here we do not skip anything. Can you clarify the reason why attnums[i] is set to -1 and how will it be used? 06. make_copy_attnamelist ``` +gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool)); ``` I think this array is too large. Can we reduce a size to (desc->natts * sizeof(bool))? Also, the free'ing should be done. 07. make_copy_attnamelist ``` +/* Loop to handle subscription table generated columns. */ +for (int i = 0; i < desc->natts; i++) ``` IIUC, the loop is needed to find generated columns on the subscriber side, right? Can you clarify as comment? 08. copy_table ``` +/* + * Regular table with no row filter and 'include_generated_columns' + * specified as 'false' during creation of subscription. + */ ``` I think this comment is not correct. After patching, all tablesync command becomes like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right? Can we restrict only when the table has generated ones? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Pgoutput not capturing the generated columns
Here are some review comments for the patch v10-0002. == Commit Message 1. Note that we don't copy columns when the subscriber-side column is also generated. Those will be filled as normal with the subscriber-side computed or default data. ~ Now this patch also introduced some errors etc, so I think that patch comment should be written differently to explicitly spell out behaviour of every combination, something like the below: Summary when (include_generated_column = true) * publisher not-generated column => subscriber not-generated column: This is just normal logical replication (not changed by this patch). * publisher not-generated column => subscriber generated column: This will give ERROR. * publisher generated column => subscriber not-generated column: The publisher generated column value is copied. * publisher generated column => subscriber generated column: The publisher generated column value is not copied. The subscriber generated column will be filled with the subscriber-side computed or default data. when (include_generated_columns = false) * publisher not-generated column => subscriber not-generated column: This is just normal logical replication (not changed by this patch). * publisher not-generated column => subscriber generated column: This will give ERROR. * publisher generated column => subscriber not-generated column: This will replicate nothing. Publisher generate-column is not replicated. The subscriber column will be filled with the subscriber-side default data. * publisher generated column => subscriber generated column: This will replicate nothing. Publisher generate-column is not replicated. The subscriber generated column will be filled with the subscriber-side computed or default data. == src/backend/replication/logical/relation.c 2. logicalrep_rel_open: I tested some of the "missing column" logic, and got the following results: Scenario A: PUB test_pub=# create table t2(a int, b int); test_pub=# create publication pub2 for table t2; SUB test_sub=# create table t2(a int, b int generated always as (a*2) stored); test_sub=# create subscription sub2 connection 'dbname=test_pub' publication pub2 with (include_generated_columns = false); Result: ERROR: logical replication target relation "public.t2" is missing replicated column: "b" ~ Scenario B: PUB/SUB identical to above, but subscription sub2 created "with (include_generated_columns = true);" Result: ERROR: logical replication target relation "public.t2" has a generated column "b" but corresponding column on source relation is not a generated column ~~~ 2a. Question Why should we get 2 different error messages for what is essentially the same problem according to whether the 'include_generated_columns' is false or true? Isn't the 2nd error message the more correct and useful one for scenarios like this involving generated columns? Thoughts? ~ 2b. Missing tests? I also noticed there seems no TAP test for the current "missing replicated column" message. IMO there should be a new test introduced for this because the loop involved too much bms logic to go untested... == src/backend/replication/logical/tablesync.c make_copy_attnamelist: NITPICK - minor comment tweak NITPICK - add some spaces after "if" code 3. Should you pfree the gencollist at the bottom of this function when you no longer need it, for tidiness? ~~~ 4. static void -fetch_remote_table_info(char *nspname, char *relname, +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist, LogicalRepRelation *lrel, List **qual) { WalRcvExecResult *res; StringInfoData cmd; TupleTableSlot *slot; Oid tableRow[] = {OIDOID, CHAROID, CHAROID}; - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID}; + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID}; Oid qualRow[] = {TEXTOID}; bool isnull; + bool*remotegenlist_res; IMO the names 'remotegenlist' and 'remotegenlist_res' should be swapped the other way around, because it is the function parameter that is the "result", whereas the 'remotegenlist_res' is just the local working var for it. ~~~ 5. fetch_remote_table_info Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple places, I think it will be better to assign this to a 'server_version' variable to be used everywhere instead of having multiple function calls. ~~~ 6. "SELECT a.attnum," " a.attname," " a.atttypid," - " a.attnum = ANY(i.indkey)" + " a.attnum = ANY(i.indkey)," + " a.attgenerated != ''" " FROM pg_catalog.pg_attribute a" " LEFT JOIN pg_catalog.pg_index i" " ON (i.indexrelid = pg_get_replica_identity_index(%u))" " WHERE a.attnum > 0::pg_catalog.int2" - " AND NOT a.attisdropped %s" + " AND NOT a.attisdropped", lrel->remoteid); + + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && + walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) || + !MySubscription->includegencols) + appendStringInfo(, "
Re: Pgoutput not capturing the generated columns
On Fri, 21 Jun 2024 at 12:51, Peter Smith wrote: > > Hi, Here are some review comments for patch v9-0003 > > == > Commit Message > > /fix/fixes/ Fixed > == > 1. > General. Is tablesync enough? > > I don't understand why is the patch only concerned about tablesync? > Does it make sense to prevent VIRTUAL column replication during > tablesync, if you aren't also going to prevent VIRTUAL columns from > normal logical replication (e.g. when copy_data = false)? Or is this > already handled somewhere? I checked the behaviour during incremental changes. I saw during decoding 'null' values are present for Virtual Generated Columns. I made the relevant changes to not support replication of Virtual generated columns. > ~~~ > > 2. > General. Missing test. > > Add some test cases to verify behaviour is different for STORED versus > VIRTUAL generated columns I have not added the tests as it would give an error in cfbot. I have added a TODO note for the same. This can be done once the VIRTUAL generated columns patch is committted. > == > src/sgml/ref/create_subscription.sgml > > NITPICK - consider rearranging as shown in my nitpicks diff > NITPICK - use sgml markup for STORED Fixed > == > src/backend/replication/logical/tablesync.c > > 3. > - if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && > - walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) || > - !MySubscription->includegencols) > + if (walrcv_server_version(LogRepWorkerWalRcvConn) < 17) > + { > + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12) > appendStringInfo(, " AND a.attgenerated = ''"); > + } > + else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 17) > + { > + if(MySubscription->includegencols) > + appendStringInfo(, " AND a.attgenerated != 'v'"); > + else > + appendStringInfo(, " AND a.attgenerated = ''"); > + } > > IMO this logic is too tricky to remain uncommented -- please add some > comments. > Also, it seems somewhat complex. I think you can achieve the same more simply: > > SUGGESTION > > if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12) > { > bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= > 17 > && MySubscription->includegencols; > if (gencols_allowed) > { > /* Replication of generated cols is supported, but not VIRTUAL cols. */ > appendStringInfo(, " AND a.attgenerated != 'v'"); > } > else > { > /* Replication of generated cols is not supported. */ > appendStringInfo(, " AND a.attgenerated = ''"); > } > } Fixed > == > > 99. > Please refer also to my attached nitpick diffs and apply those if you agree. Applied the changes. I have attached the updated patch v10 here in [1]. [1]: https://www.postgresql.org/message-id/CANhcyEUMCk6cCbw0vVZWo8FRd6ae9CmKG%3DgKP-9Q67jLn8HqtQ%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
Hi, here are some patch v9-0001 comments. I saw Kuroda-san has already posted comments for this patch so there may be some duplication here. == GENERAL 1. The later patches 0002 etc are checking to support only STORED gencols. But, doesn't that restriction belong in this patch 0001 so VIRTUAL columns are not decoded etc in the first place... (??) ~~~ 2. The "Generated Columns" docs mentioned in my previous review comment [2] should be modified by this 0001 patch. ~~~ 3. I think the "Message Format" page mentioned in my previous review comment [3] should be modified by this 0001 patch. == Commit message 4. The patch name is still broken as previously mentioned [1, #1] == doc/src/sgml/protocol.sgml 5. Should this docs be referring to STORED generated columns, instead of just generated columns? == doc/src/sgml/ref/create_subscription.sgml 6. Should this be docs referring to STORED generated columns, instead of just generated columns? == src/bin/pg_dump/pg_dump.c getSubscriptions: NITPICK - tabs NITPICK - patch removed a blank line it should not be touching NITPICK = patch altered indents it should not be touching NITPICK - a missing blank line that was previously present 7. + else + appendPQExpBufferStr(query, + " false AS subincludegencols,\n"); There is an unwanted comma here. ~ dumpSubscription NITPICK - patch altered indents it should not be touching == src/bin/pg_dump/pg_dump.h NITPICK - unnecessary blank line == src/bin/psql/describe.c describeSubscriptions NITPICK - bad indentation 8. In my previous review [1, #4b] I suggested this new column should be in a different order (e.g. adjacent to the other ones ahead of 'Conninfo'), but this is not yet addressed. == src/test/subscription/t/011_generated.pl NITPICK - missing space in comment NITPICK - misleading "because" wording in the comment == 99. See also my attached nitpicks diff, for cosmetic issues. Please apply whatever you agree with. == [1] My v8-0001 review - https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia. diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 1fb19f5..9f2cac9 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -4739,7 +4739,7 @@ getSubscriptions(Archive *fout) int i_suboriginremotelsn; int i_subenabled; int i_subfailover; - int i_subincludegencols; + int i_subincludegencols; int i, ntups; @@ -4770,6 +4770,7 @@ getSubscriptions(Archive *fout) " s.subowner,\n" " s.subconninfo, s.subslotname, s.subsynccommit,\n" " s.subpublications,\n"); + if (fout->remoteVersion >= 14) appendPQExpBufferStr(query, " s.subbinary,\n"); else @@ -4804,7 +4805,7 @@ getSubscriptions(Archive *fout) if (dopt->binary_upgrade && fout->remoteVersion >= 17) appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n" - " s.subenabled,\n"); +" s.subenabled,\n"); else appendPQExpBufferStr(query, " NULL AS suboriginremotelsn,\n" " false AS subenabled,\n"); @@ -4815,12 +4816,14 @@ getSubscriptions(Archive *fout) else appendPQExpBuffer(query, " false AS subfailover,\n"); + if (fout->remoteVersion >= 17) appendPQExpBufferStr(query, " s.subincludegencols\n"); else appendPQExpBufferStr(query, " false AS subincludegencols,\n"); + appendPQExpBufferStr(query, "FROM pg_subscription s\n"); diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h index a2c35fe..8c07933 100644 --- a/src/bin/pg_dump/pg_dump.h +++ b/src/bin/pg_dump/pg_dump.h @@ -672,7 +672,6 @@ typedef struct _SubscriptionInfo char *suboriginremotelsn; char *subfailover; char *subincludegencols; - } SubscriptionInfo; /* diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 491fcb9..00f3131 100644 --- a/src/bin/psql/describe.c
RE: Pgoutput not capturing the generated columns
Hi Shubham, Thanks for sharing new patch! You shared as v9, but it should be v10, right? Also, since there are no commitfest entry, I registered [1]. You can rename the title based on the needs. Currently CFbot said OK. Anyway, below are my comments. 01. General Your patch contains unnecessary changes. Please remove all of them. E.g., ``` " s.subpublications,\n"); - ``` And ``` appendPQExpBufferStr(query, " o.remote_lsn AS suboriginremotelsn,\n" -" s.subenabled,\n"); + " s.subenabled,\n"); ``` 02. General Again, please run the pgindent/pgperltidy. 03. test_decoding Previously I suggested to the default value of to be include_generated_columns should be true, so you modified like that. However, Peter suggested opposite opinion [3] and you just revised accordingly. I think either way might be okay, but at least you must clarify the reason why you preferred to set default to false and changed accordingly. 04. decoding_into_rel.sql According to the comment atop this file, this test should insert result to a table. But added case does not - we should put them at another place. I.e., create another file. 05. decoding_into_rel.sql ``` +-- when 'include-generated-columns' is not set ``` Can you clarify the expected behavior as a comment? 06. getSubscriptions ``` + else + appendPQExpBufferStr(query, + " false AS subincludegencols,\n"); ``` I think the comma is not needed. Also, this error meant that you did not test to use pg_dump for instances prior PG16. Please verify whether we can dump subscriptions and restore them accordingly. [1]: https://commitfest.postgresql.org/48/5068/ [2]: https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com [3]: https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Pgoutput not capturing the generated columns
On Fri, Jun 21, 2024 at 8:23 AM Peter Smith wrote: > > Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday. > > == > src/test/subscription/t/011_generated.pl > > 1. > Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I > don't think so. > > ~~~ > > 2. > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); > +is( $result, qq(4|8 > +5|10), 'confirm generated columns ARE replicated when the > subscriber-side column is not generated'); > + > +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); > + > +$node_publisher->wait_for_catchup('sub3'); > + > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); > +is( $result, qq(4|24 > +5|25), 'confirm generated columns are NOT replicated when the > subscriber-side column is also generated'); > + > > It would be prudent to do explicit "SELECT a,b" instead of "SELECT *", > for readability and to avoid any surprises. Both the comments are handled. Patch v9-0001 contains all the changes required. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8Rj%2B6kwOGmn5MsRaTmciJDxtvNsyszMoPXV62OGPGzkxrCg%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
On Thu, Jun 20, 2024 at 9:03 AM Peter Smith wrote: > > Hi, here are my review comments for v8-0001. > > == > Commit message. > > 1. > It seems like the patch name was accidentally omitted, so it became a > mess when it defaulted to the 1st paragraph of the commit message. > > == > contrib/test_decoding/test_decoding.c > > 2. > + data->include_generated_columns = true; > > I previously posted a comment [1, #4] that this should default to > false; IMO it is unintuitive for the test_decoding to have an > *opposite* default behaviour compared to CREATE SUBSCRIPTION. > > == > doc/src/sgml/ref/create_subscription.sgml > > NITPICK - remove the inconsistent blank line in SGML > > == > src/backend/commands/subscriptioncmds.c > > 3. > +#define SUBOPT_include_generated_columns 0x0001 > > I previously posted a comment [1, #6] that this should be UPPERCASE, > but it is not yet fixed. > > == > src/bin/psql/describe.c > > NITPICK - move and reword the bogus comment > > ~ > > 4. > + if (pset.sversion >= 17) > + appendPQExpBuffer(, > + ", subincludegencols AS \"%s\"\n", > + gettext_noop("include_generated_columns")); > > 4a. > For consistency with every other parameter, that column title should > be written in words "Include generated columns" (not > "include_generated_columns"). > > ~ > > 4b. > IMO this new column belongs with the other subscription parameter > columns (e.g. put it ahead of the "Conninfo" column). > > == > src/test/subscription/t/011_generated.pl > > NITPICK - fixed a comment > > 5. > IMO, it would be better for readability if all the matching CREATE > TABLE for publisher and subscriber are kept together, instead of the > current code which is creating all publisher tables and then creating > all subscriber tables. > > ~~~ > > 6. > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); > +is( $result, qq(4|8 > +5|10), 'confirm generated columns ARE replicated when the > subscriber-side column is not generated'); > + > ... > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); > +is( $result, qq(4|24 > +5|25), 'confirm generated columns are NOT replicated when the > subscriber-side column is also generated'); > + > > 6a. > These SELECT all need ORDER BY to protect against the SELECT * > returning rows in some unexpected order. > > ~ > > 6b. > IMO there should be more comments here to explain how you can tell the > column was NOT replicated. E.g. it is because the result value of 'b' > is the subscriber-side computed value (which you made deliberately > different to the publisher-side computed value). > > == > > 99. > Please also refer to the attached nitpicks top-up patch for minor > cosmetic stuff. All the comments are handled. The attached Patch contains all the suggested changes. Thanks and Regards, Shubham Khanna. v9-0001-Currently-generated-column-values-are-not-replica.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
Hi Shubham/Shlok. FYI, there is some other documentation page that mentions generated column replication messages. This page [1] says: "Next, the following message part appears for each column included in the publication (except generated columns):" But, with the introduction of your new feature, I think that the "except generated columns" wording is not strictly correct anymore. IOW that docs page needs updating but AFAICT your patches have not addressed this yet. == [1] https://www.postgresql.org/docs/17/protocol-logicalrep-message-formats.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi, Here are some review comments for patch v9-0003 == Commit Message /fix/fixes/ == 1. General. Is tablesync enough? I don't understand why is the patch only concerned about tablesync? Does it make sense to prevent VIRTUAL column replication during tablesync, if you aren't also going to prevent VIRTUAL columns from normal logical replication (e.g. when copy_data = false)? Or is this already handled somewhere? ~~~ 2. General. Missing test. Add some test cases to verify behaviour is different for STORED versus VIRTUAL generated columns == src/sgml/ref/create_subscription.sgml NITPICK - consider rearranging as shown in my nitpicks diff NITPICK - use sgml markup for STORED == src/backend/replication/logical/tablesync.c 3. - if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && - walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) || - !MySubscription->includegencols) + if (walrcv_server_version(LogRepWorkerWalRcvConn) < 17) + { + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12) appendStringInfo(, " AND a.attgenerated = ''"); + } + else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 17) + { + if(MySubscription->includegencols) + appendStringInfo(, " AND a.attgenerated != 'v'"); + else + appendStringInfo(, " AND a.attgenerated = ''"); + } IMO this logic is too tricky to remain uncommented -- please add some comments. Also, it seems somewhat complex. I think you can achieve the same more simply: SUGGESTION if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12) { bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 17 && MySubscription->includegencols; if (gencols_allowed) { /* Replication of generated cols is supported, but not VIRTUAL cols. */ appendStringInfo(, " AND a.attgenerated != 'v'"); } else { /* Replication of generated cols is not supported. */ appendStringInfo(, " AND a.attgenerated = ''"); } } == 99. Please refer also to my attached nitpick diffs and apply those if you agree. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 5666931..4ce89e9 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -433,16 +433,15 @@ CREATE SUBSCRIPTION subscription_nameinclude_generated_columns (boolean) - Specifies whether the generated columns present in the tables - associated with the subscription should be replicated. + Specifies whether the STORED generated columns present in + the tables associated with the subscription should be replicated. The default is false. If the subscriber-side column is also a generated column then this option has no effect; the subscriber column will be filled as normal with the - subscriber-side computed or default data. This option allows replication - of only STORED GENERATED COLUMNS. + subscriber-side computed or default data.
Re: Pgoutput not capturing the generated columns
Hi Shubham/Shlok. FYI, my patch describing the current PG17 behaviour of logical replication of generated columns was recently pushed [1]. Note that this will have some impact on your patch set. e.g. You are changing the current replication behaviour, so the "Generated Columns" section note will now need to be modified by your patches. == [1] https://github.com/postgres/postgres/commit/7a089f6e6a14ca3a5dc8822c393c6620279968b9 [2] Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi, here are some review comments for patch v9-0002. == src/backend/replication/logical/relation.c 1. logicalrep_rel_open - if (attr->attisdropped) + if (attr->attisdropped || + (!MySubscription->includegencols && attr->attgenerated)) You replied to my question from the previous review [1, #2] as follows: In case 'include_generated_columns' is 'true'. column list in remoterel will have an entry for generated columns. So, in this case if we skip every attr->attgenerated, we will get a missing column error. ~ TBH, the reason seems very subtle to me. Perhaps that "(!MySubscription->includegencols && attr->attgenerated))" condition should be coded as a separate "if", so then you can include a comment similar to your answer, to explain it. == src/backend/replication/logical/tablesync.c make_copy_attnamelist: NITPICK - punctuation in function comment NITPICK - add/reword some more comments NITPICK - rearrange comments to be closer to the code they are commenting ~ 2. make_copy_attnamelist. + /* + * Construct column list for COPY. + */ + for (int i = 0; i < rel->remoterel.natts; i++) + { + if(!gencollist[i]) + attnamelist = lappend(attnamelist, + makeString(rel->remoterel.attnames[i])); + } IIUC isn't this assuming that the attribute number (aka column order) is the same on the subscriber side (e.g. gencollist idx) and on the publisher side (e.g. remoterel.attnames[i]). AFAIK logical replication does not require this ordering must be match like that, therefore I am suspicious this new logic is accidentally imposing new unwanted assumptions/restrictions. I had asked the same question before [1-#4] about this code, but there was no reply. Ideally, there would be more test cases for when the columns (including the generated ones) are all in different orders on the pub/sub tables. ~~~ 3. General - varnames. It would help with understanding if the 'attgenlist' variables in all these functions are re-named to make it very clear that this is referring to the *remote* (publisher-side) table genlist, not the subscriber table genlist. ~~~ 4. + int i = 0; + appendStringInfoString(, "COPY (SELECT "); - for (int i = 0; i < lrel.natts; i++) + foreach_ptr(ListCell, l, attnamelist) { - appendStringInfoString(, quote_identifier(lrel.attnames[i])); - if (i < lrel.natts - 1) + appendStringInfoString(, quote_identifier(strVal(l))); + if (i < attnamelist->length - 1) appendStringInfoString(, ", "); + i++; } 4a. I think the purpose of the new macros is to avoid using ListCell, and also 'l' is an unhelpful variable name. Shouldn't this code be more like: foreach_node(String, att_name, attnamelist) ~ 4b. The code can be far simpler if you just put the comma (", ") always up-front except the *first* iteration, so you can avoid checking the list length every time. For example: if (i++) appendStringInfoString(, ", "); == src/test/subscription/t/011_generated.pl 5. General. Hmm. This patch 0002 included many formatting changes to tables tab2 and tab3 and subscriptions sub2 and sub3 but they do not belong here. The proper formatting for those needs to be done back in patch 0001 where they were introduced. Patch 0002 should just concentrate only on the new stuff for patch 0002. ~ 6. CREATE TABLES would be better in pairs IMO it will be better if the matching CREATE TABLE for pub and sub are kept together, instead of separating them by doing all pub then all sub. I previously made the same comment for patch 0001, so maybe it will be addressed next time... ~ 7. SELECT * +$result = + $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a"); It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT *", for readability and so there are no surprises. == 99. Please also refer to my attached nitpicks diff for numerous cosmetic changes, and apply if you agree. == [1] https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 0fbf661..ddeb6a8 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -694,7 +694,7 @@ process_syncing_tables(XLogRecPtr current_lsn) /* * Create list of columns for COPY based on logical relation mapping. Do not - * include generated columns, of the subscription table, in the column list. + * include generated columns of the subscription table in the column list. */ static List * make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *attgenlist) @@ -706,6 +706,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *attgenlist) desc = RelationGetDescr(rel->localrel); gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool)); + /* Loop to handle subscription table generated columns. */ for (int i = 0; i < desc->natts; i++)
Re: Pgoutput not capturing the generated columns
Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday. == src/test/subscription/t/011_generated.pl 1. Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I don't think so. ~~~ 2. +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); +is( $result, qq(4|8 +5|10), 'confirm generated columns ARE replicated when the subscriber-side column is not generated'); + +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); + +$node_publisher->wait_for_catchup('sub3'); + +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); +is( $result, qq(4|24 +5|25), 'confirm generated columns are NOT replicated when the subscriber-side column is also generated'); + It would be prudent to do explicit "SELECT a,b" instead of "SELECT *", for readability and to avoid any surprises. == Kind Regards, Peter Smith. Fujitsu Australia.
Re: Pgoutput not capturing the generated columns
On Thu, 20 Jun 2024 at 12:52, vignesh C wrote: > > On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut wrote: > > > > On 19.06.24 13:22, Shubham Khanna wrote: > > > All the comments are handled. > > > > > > The attached Patch contains all the suggested changes. > > > > Please also take a look at the proposed patch for virtual generated > > columns [0] and consider how that would affect your patch. I think your > > feature can only replicate *stored* generated columns. So perhaps the > > documentation and terminology in your patch should reflect that. > > This patch is unable to manage virtual generated columns because it > stores NULL values for them. Along with documentation the initial sync > command being generated also should be changed to sync data > exclusively for stored generated columns, omitting virtual ones. I > suggest treating these changes as a separate patch(0003) for future > merging or a separate commit, depending on the order of patch > acceptance. > I have addressed the issue and updated the documentation accordingly. And created a new 0003 patch. Please refer to v9-0003 patch for the same in [1]. [1]: https://www.postgresql.org/message-id/CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S%2BjRzzzXo2RavNWw%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut wrote: > > On 19.06.24 13:22, Shubham Khanna wrote: > > All the comments are handled. > > > > The attached Patch contains all the suggested changes. > > Please also take a look at the proposed patch for virtual generated > columns [0] and consider how that would affect your patch. I think your > feature can only replicate *stored* generated columns. So perhaps the > documentation and terminology in your patch should reflect that. This patch is unable to manage virtual generated columns because it stores NULL values for them. Along with documentation the initial sync command being generated also should be changed to sync data exclusively for stored generated columns, omitting virtual ones. I suggest treating these changes as a separate patch(0003) for future merging or a separate commit, depending on the order of patch acceptance. Regards, Vignesh
Re: Pgoutput not capturing the generated columns
Hi, here are my review comments for v8-0001. == Commit message. 1. It seems like the patch name was accidentally omitted, so it became a mess when it defaulted to the 1st paragraph of the commit message. == contrib/test_decoding/test_decoding.c 2. + data->include_generated_columns = true; I previously posted a comment [1, #4] that this should default to false; IMO it is unintuitive for the test_decoding to have an *opposite* default behaviour compared to CREATE SUBSCRIPTION. == doc/src/sgml/ref/create_subscription.sgml NITPICK - remove the inconsistent blank line in SGML == src/backend/commands/subscriptioncmds.c 3. +#define SUBOPT_include_generated_columns 0x0001 I previously posted a comment [1, #6] that this should be UPPERCASE, but it is not yet fixed. == src/bin/psql/describe.c NITPICK - move and reword the bogus comment ~ 4. + if (pset.sversion >= 17) + appendPQExpBuffer(, + ", subincludegencols AS \"%s\"\n", + gettext_noop("include_generated_columns")); 4a. For consistency with every other parameter, that column title should be written in words "Include generated columns" (not "include_generated_columns"). ~ 4b. IMO this new column belongs with the other subscription parameter columns (e.g. put it ahead of the "Conninfo" column). == src/test/subscription/t/011_generated.pl NITPICK - fixed a comment 5. IMO, it would be better for readability if all the matching CREATE TABLE for publisher and subscriber are kept together, instead of the current code which is creating all publisher tables and then creating all subscriber tables. ~~~ 6. +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); +is( $result, qq(4|8 +5|10), 'confirm generated columns ARE replicated when the subscriber-side column is not generated'); + ... +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); +is( $result, qq(4|24 +5|25), 'confirm generated columns are NOT replicated when the subscriber-side column is also generated'); + 6a. These SELECT all need ORDER BY to protect against the SELECT * returning rows in some unexpected order. ~ 6b. IMO there should be more comments here to explain how you can tell the column was NOT replicated. E.g. it is because the result value of 'b' is the subscriber-side computed value (which you made deliberately different to the publisher-side computed value). == 99. Please also refer to the attached nitpicks top-up patch for minor cosmetic stuff. == [1] https://www.postgresql.org/message-id/CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb%2BV%3Dp2YHL8Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e8779dc..ee27a58 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -437,7 +437,6 @@ CREATE SUBSCRIPTION subscription_namefalse. - If the subscriber-side column is also a generated column then this option has no effect; the subscriber column will be filled as normal with the diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 75a52ce..663015d 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -6609,12 +6609,12 @@ describeSubscriptions(const char *pattern, bool verbose) appendPQExpBuffer(, ", subskiplsn AS \"%s\"\n", gettext_noop("Skip LSN")); + + /* include_generated_columns is only supported in v18 and higher */ if (pset.sversion >= 17) appendPQExpBuffer(, ", subincludegencols AS \"%s\"\n", gettext_noop("include_generated_columns")); - - // include_generated_columns } /* Only display subscriptions in current database. */ diff --git a/src/test/subscription/t/011_generated.pl b/src/test/subscription/t/011_generated.pl index 92b3dbf..cbd5015 100644 --- a/src/test/subscription/t/011_generated.pl +++ b/src/test/subscription/t/011_generated.pl @@ -24,7 +24,7 @@ $node_publisher->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" ); -# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has NON-gnerated col 'b'. +# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has NON-generated col 'b'. $node_publisher->safe_psql('postgres', "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" );
Re: Pgoutput not capturing the generated columns
On 19.06.24 13:22, Shubham Khanna wrote: All the comments are handled. The attached Patch contains all the suggested changes. Please also take a look at the proposed patch for virtual generated columns [0] and consider how that would affect your patch. I think your feature can only replicate *stored* generated columns. So perhaps the documentation and terminology in your patch should reflect that. [0]: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org
Re: Pgoutput not capturing the generated columns
On Mon, Jun 17, 2024 at 1:57 PM Peter Smith wrote: > > Hi, here are my review comments for patch v7-0001. > > == > 1. GENERAL - \dRs+ > > Shouldn't the new SUBSCRIPTION parameter be exposed via "describe" > (e.g. \dRs+ mysub) the same as the other boolean parameters? > > == > Commit message > > 2. > When 'include_generated_columns' is false then the PUBLICATION > col-list will ignore any generated cols even when they are present in > a PUBLICATION col-list > > ~ > > Maybe you don't need to mention "PUBLICATION col-list" twice. > > SUGGESTION > When 'include_generated_columns' is false, generated columns are not > replicated, even when present in a PUBLICATION col-list. > > ~~~ > > 2. > CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= > 'publication pub1; > > ~ > > 2a. > (I've questioned this one in previous reviews) > > What exactly is the purpose of this statement in the commit message? > Was this supposed to demonstrate the usage of the > 'include_generated_columns' parameter? > > ~ > > 2b. > /publication/ PUBLICATION/ > > > ~~~ > > 3. > If the subscriber-side column is also a generated column then > thisoption has no effect; the replicated data will be ignored and the > subscriber column will be filled as normal with the subscriber-side > computed or default data. > > ~ > > Missing space: /thisoption/this option/ > > == > .../expected/decoding_into_rel.out > > 4. > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > > Why is the default value here equivalent to > 'include-generated-columns' = '1' here instead of '0'? The default for > the CREATE SUBSCRIPTION parameter 'include_generated_columns' is > false, and IMO it seems confusing for these 2 defaults to be > different. Here I think it should default to '0' *regardless* of what > the previous functionality might have done -- e.g. this is a "test > decoder" so the parameter should behave sensibly. > > == > .../test_decoding/sql/decoding_into_rel.sql > > NITPICK - wrong comments. > > == > doc/src/sgml/protocol.sgml > > 5. > + > + include_generated_columns > + > + > +Boolean option to enable generated columns. This option controls > +whether generated columns should be included in the string > +representation of tuples during logical decoding in PostgreSQL. > +The default is false. > + > + > + > + > > Does the protocol version need to be bumped to support this new option > and should that be mentioned on this page similar to how all other > version values are mentioned? I already did the Backward Compatibility test earlier and decided that protocol bump is not needed. > doc/src/sgml/ref/create_subscription.sgml > > NITPICK - some missing words/sentence. > NITPICK - some missing tags. > NITPICK - remove duplicated sentence. > NITPICK - add another . > > == > src/backend/commands/subscriptioncmds.c > > 6. > #define SUBOPT_ORIGIN 0x8000 > +#define SUBOPT_include_generated_columns 0x0001 > > Please use UPPERCASE for consistency with other macros. > > == > .../libpqwalreceiver/libpqwalreceiver.c > > 7. > + if (options->proto.logical.include_generated_columns && > + PQserverVersion(conn->streamConn) >= 17) > + appendStringInfoString(, ", include_generated_columns 'on'"); > + > > IMO it makes more sense to say 'true' here instead of 'on'. It seems > like this was just cut/paste from the above code (where 'on' was > sensible). > > == > src/include/catalog/pg_subscription.h > > 8. > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > BKI_SHARED_RELATION BKI_ROW > * slots) in the upstream database are enabled > * to be synchronized to the standbys. */ > > + bool subincludegencol; /* True if generated columns must be published */ > + > > Not fixed as claimed. This field name ought to be plural. > > /subincludegencol/subincludegencols/ > > ~~~ > > 9. > char*origin; /* Only publish data originating from the > * specified origin */ > + bool includegencol; /* publish generated column data */ > } Subscription; > > Not fixed as claimed. This field name ought to be plural. > > /includegencol/includegencols/ > > == > src/test/subscription/t/031_column_list.pl > > 10. > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a > * 2) STORED)" > +); > + > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a > + 10) STORED)" > +); > + >
Re: Pgoutput not capturing the generated columns
> Few comments: > 1) Here tab1 and tab2 are exactly the same tables, just check if the > table tab1 itself can be used for your tests. > @@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres', > "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED)" > ); > +$node_publisher->safe_psql('postgres', > + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED)" > +); On the subscription side the tables have different descriptions, so we need to have different tables on the publisher side. > 2) We can document that the include_generate_columns option cannot be > altered. > > 3) You can mention that include-generated-columns is true by default > and generated column data will be selected > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > > 4) The comment seems to be wrong here, the comment says b will not be > replicated but b is being selected: > -- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > data > - > BEGIN > table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > COMMIT > (5 rows) > > 5) Similarly here too the comment seems to be wrong, the comment says > b will not replicated but b is not being selected: > INSERT INTO gencoltable (a) VALUES (4), (5), (6); > -- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > data > > BEGIN > table public.gencoltable: INSERT: a[integer]:4 > table public.gencoltable: INSERT: a[integer]:5 > table public.gencoltable: INSERT: a[integer]:6 > COMMIT > (5 rows) > > 6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to > keep the name consistent: > --- a/src/backend/commands/subscriptioncmds.c > +++ b/src/backend/commands/subscriptioncmds.c > @@ -72,6 +72,7 @@ > #define SUBOPT_FAILOVER0x2000 > #define SUBOPT_LSN 0x4000 > #define SUBOPT_ORIGIN 0x8000 > +#define SUBOPT_include_generated_columns 0x0001 > > 7) The comment style seems to be inconsistent, both of them can start > in lower case > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +-- When 'include-generated-columns' is not set > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > + > +-- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > > 8) This could be changed to remove the insert statements by using > pg_logical_slot_peek_changes: > -- When 'include-generated-columns' is not set > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); > -- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > INSERT INTO gencoltable (a) VALUES (4), (5), (6); > -- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', >
Re: Pgoutput not capturing the generated columns
> Hi Shubham, thanks for providing a patch. > I have some comments for v6-0001. > > 1. create_subscription.sgml > There is repetition of the same line. > > + > + Specifies whether the generated columns present in the tables > + associated with the subscription should be replicated. If the > + subscriber-side column is also a generated column then this option > + has no effect; the replicated data will be ignored and the > subscriber > + column will be filled as normal with the subscriber-side computed > or > + default data. > + false. > + > + > + > + This parameter can only be set true if > copy_data is > + set to false. If the subscriber-side > column is also a > + generated column then this option has no effect; the > replicated data will > + be ignored and the subscriber column will be filled as > normal with the > + subscriber-side computed or default data. > + > > == > 2. subscriptioncmds.c > > 2a. The macro name should be in uppercase. We can use a short name > like 'SUBOPT_INCLUDE_GEN_COL'. Thought? > +#define SUBOPT_include_generated_columns 0x0001 > > 2b.Update macro name accordingly > + if (IsSet(supported_opts, SUBOPT_include_generated_columns)) > + opts->include_generated_columns = false; > > 2c. Update macro name accordingly > + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) && > + strcmp(defel->defname, "include_generated_columns") == 0) > + { > + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns)) > + errorConflictingDefElem(defel, pstate); > + > + opts->specified_opts |= SUBOPT_include_generated_columns; > + opts->include_generated_columns = defGetBoolean(defel); > + } > > 2d. Update macro name accordingly > + SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN | > + SUBOPT_include_generated_columns); > > > == > > 3. decoding_into_rel.out > > 3a. In comment, I think it should be "When 'include-generated-columns' > = '1' the generated column 'b' values will be replicated" > +-- When 'include-generated-columns' = '1' the generated column 'b' > values will not be replicated > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > > 3b. In comment, I think it should be "When 'include-generated-columns' > = '1' the generated column 'b' values will not be replicated" > +-- When 'include-generated-columns' = '0' the generated column 'b' > values will be replicated > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > + data > + > + BEGIN > + table public.gencoltable: INSERT: a[integer]:4 > + table public.gencoltable: INSERT: a[integer]:5 > + table public.gencoltable: INSERT: a[integer]:6 > + COMMIT > +(5 rows) > > = > > 4. Here names for both the tests are the same. I think we should use > different names. > > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); > +is( $result, qq(4|8 > +5|10), 'generated columns replicated to non-generated column on subscriber'); > + > +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); > + > +$node_publisher->wait_for_catchup('sub3'); > + > +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); > +is( $result, qq(4|24 > +5|25), 'generated columns replicated to non-generated column on subscriber'); All the comments are handled. The attached Patch contains all the suggested changes. Thanks and Regards, Shubham Khanna. v8-0001-Currently-generated-column-values-are-not-replica.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
Hi, here are my review comments for patch v7-0002 == Commit Message NITPICKS - rearrange paragraphs - typo "donot" - don't start a sentence with "And" - etc. Please see the attachment for my suggested commit message text updates and take from it whatever you agree with. == doc/src/sgml/ref/create_subscription.sgml 1. + If the subscriber-side column is also a generated column then this option + has no effect; the replicated data will be ignored and the subscriber + column will be filled as normal with the subscriber-side computed or + default data. And during table synchronization, the data corresponding to + the generated column on subscriber-side will not be sent from the + publisher to the subscriber. This text already mentions subscriber-side generated cols. IMO you don't need to say anything at all about table synchronization -- that's just an internal code optimization, which is not something the user needs to know about. IOW, the entire last sentence ("And during...") should be removed. == src/backend/replication/logical/relation.c 2. logicalrep_rel_open - if (attr->attisdropped) + if (attr->attisdropped || + (!MySubscription->includegencol && attr->attgenerated)) { entry->attrmap->attnums[i] = -1; continue; ~ Maybe I'm mistaken, but isn't this code for skipping checking for "missing" subscriber-side (aka local) columns? Can't it just unconditionally skip every attr->attgenerated -- i.e. why does it matter if the MySubscription->includegencol was set or not? == src/backend/replication/logical/tablesync.c 3. make_copy_attnamelist - for (i = 0; i < rel->remoterel.natts; i++) + desc = RelationGetDescr(rel->localrel); + + for (i = 0; i < desc->natts; i++) { - attnamelist = lappend(attnamelist, - makeString(rel->remoterel.attnames[i])); + int attnum; + Form_pg_attribute attr = TupleDescAttr(desc, i); + + if (!attr->attgenerated) + continue; + + attnum = logicalrep_rel_att_by_name(>remoterel, + NameStr(attr->attname)); + + /* + * Check if subscription table have a generated column with same + * column name as a non-generated column in the corresponding + * publication table. + */ + if (attnum >=0 && !attgenlist[attnum]) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical replication target relation \"%s.%s\" is missing replicated column: \"%s\"", + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname; + + if (attnum >= 0) + gencollist = lappend_int(gencollist, attnum); } ~ NITPICK - Use C99-style for loop variables NITPICK - Typo in comment NITPICK - spaces ~ 3a. I think above code should be refactored so there is only one check for "if (attnum >= 0)" -- e.g. other condition should be nested. ~ 3b. That ERROR message says "missing replicated column", but that doesn't seem much like what the code-comment was saying this code is about. ~~~ 4. + for (i = 0; i < rel->remoterel.natts; i++) + { + + if (gencollist != NIL && j < gencollist->length && + list_nth_int(gencollist, j) == i) + j++; + else + attnamelist = lappend(attnamelist, + makeString(rel->remoterel.attnames[i])); + } NITPICK - Use C99-style for loop variables NITPICK - Unnecessary blank lines ~ IIUC the subscriber-side table and the publisher-side table do NOT have to have all the columns in identical order for the logical replication to work correcly. AFAIK it works fine so long as the column names match for the replicated columns. Therefore, I am suspicious that this new patch code seems to be imposing some new ordering assumptions/restrictions (e.g. list_nth_int stuff) which are not current requirements. ~~~ copy_table: NITPICK - comment typo NITPICK - comment wording ~ 5. + int i = 0; + ListCell *l; + appendStringInfoString(, "COPY (SELECT "); - for (int i = 0; i < lrel.natts; i++) + foreach(l, attnamelist) { - appendStringInfoString(, quote_identifier(lrel.attnames[i])); - if (i < lrel.natts - 1) + appendStringInfoString(, quote_identifier(strVal(lfirst(l; + if (i < attnamelist->length - 1) appendStringInfoString(, ", "); + i++; } IIUC for new code like this, it is preferred to use the foreach* macros instead of ListCell. == src/test/regress/sql/subscription.sql 6. --- fail - copy_data and include_generated_columns are mutually exclusive options -CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (include_generated_columns = true); -ERROR: copy_data = true and include_generated_columns = true are mutually exclusive options It is OK to delete this test now but IMO still needs to be some "include_generated_columns must be boolean" test cases (e.g. same as there was two_phase). Actually, this should probably be done by the 0001 patch. == src/test/subscription/t/011_generated.pl 7. All the PRIMARY KEY stuff may be overkill. Are primary keys really needed for these tests? ~~~ 8.
Re: Pgoutput not capturing the generated columns
Hi, here are my review comments for patch v7-0001. == 1. GENERAL - \dRs+ Shouldn't the new SUBSCRIPTION parameter be exposed via "describe" (e.g. \dRs+ mysub) the same as the other boolean parameters? == Commit message 2. When 'include_generated_columns' is false then the PUBLICATION col-list will ignore any generated cols even when they are present in a PUBLICATION col-list ~ Maybe you don't need to mention "PUBLICATION col-list" twice. SUGGESTION When 'include_generated_columns' is false, generated columns are not replicated, even when present in a PUBLICATION col-list. ~~~ 2. CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= 'publication pub1; ~ 2a. (I've questioned this one in previous reviews) What exactly is the purpose of this statement in the commit message? Was this supposed to demonstrate the usage of the 'include_generated_columns' parameter? ~ 2b. /publication/ PUBLICATION/ ~~~ 3. If the subscriber-side column is also a generated column then thisoption has no effect; the replicated data will be ignored and the subscriber column will be filled as normal with the subscriber-side computed or default data. ~ Missing space: /thisoption/this option/ == .../expected/decoding_into_rel.out 4. +-- When 'include-generated-columns' is not set +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); +data +- + BEGIN + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 + COMMIT +(5 rows) Why is the default value here equivalent to 'include-generated-columns' = '1' here instead of '0'? The default for the CREATE SUBSCRIPTION parameter 'include_generated_columns' is false, and IMO it seems confusing for these 2 defaults to be different. Here I think it should default to '0' *regardless* of what the previous functionality might have done -- e.g. this is a "test decoder" so the parameter should behave sensibly. == .../test_decoding/sql/decoding_into_rel.sql NITPICK - wrong comments. == doc/src/sgml/protocol.sgml 5. + + include_generated_columns + + +Boolean option to enable generated columns. This option controls +whether generated columns should be included in the string +representation of tuples during logical decoding in PostgreSQL. +The default is false. + + + + Does the protocol version need to be bumped to support this new option and should that be mentioned on this page similar to how all other version values are mentioned? == doc/src/sgml/ref/create_subscription.sgml NITPICK - some missing words/sentence. NITPICK - some missing tags. NITPICK - remove duplicated sentence. NITPICK - add another . == src/backend/commands/subscriptioncmds.c 6. #define SUBOPT_ORIGIN 0x8000 +#define SUBOPT_include_generated_columns 0x0001 Please use UPPERCASE for consistency with other macros. == .../libpqwalreceiver/libpqwalreceiver.c 7. + if (options->proto.logical.include_generated_columns && + PQserverVersion(conn->streamConn) >= 17) + appendStringInfoString(, ", include_generated_columns 'on'"); + IMO it makes more sense to say 'true' here instead of 'on'. It seems like this was just cut/paste from the above code (where 'on' was sensible). == src/include/catalog/pg_subscription.h 8. @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW * slots) in the upstream database are enabled * to be synchronized to the standbys. */ + bool subincludegencol; /* True if generated columns must be published */ + Not fixed as claimed. This field name ought to be plural. /subincludegencol/subincludegencols/ ~~~ 9. char*origin; /* Only publish data originating from the * specified origin */ + bool includegencol; /* publish generated column data */ } Subscription; Not fixed as claimed. This field name ought to be plural. /includegencol/includegencols/ == src/test/subscription/t/031_column_list.pl 10. +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED)" +); + +$node_publisher->safe_psql('postgres', + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a + 10) STORED)" +); + $node_subscriber->safe_psql('postgres', "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 22) STORED, c int)" ); +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab2 (a int PRIMARY KEY, b int)" +); + +$node_subscriber->safe_psql('postgres', + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a + 20) STORED)" +); IMO the test needs lots more comments to describe what it is doing: For example, the setup
Re: Pgoutput not capturing the generated columns
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna wrote: > > > Thanks for the updated patch, few comments: > > 1) The option name seems wrong here: > > In one place include_generated_column is specified and other place > > include_generated_columns is specified: > > > > + else if (IsSet(supported_opts, > > SUBOPT_INCLUDE_GENERATED_COLUMN) && > > +strcmp(defel->defname, > > "include_generated_column") == 0) > > + { > > + if (IsSet(opts->specified_opts, > > SUBOPT_INCLUDE_GENERATED_COLUMN)) > > + errorConflictingDefElem(defel, pstate); > > + > > + opts->specified_opts |= > > SUBOPT_INCLUDE_GENERATED_COLUMN; > > + opts->include_generated_column = > > defGetBoolean(defel); > > + } > > Fixed. > > > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c > > index d453e224d9..e8ff752fd9 100644 > > --- a/src/bin/psql/tab-complete.c > > +++ b/src/bin/psql/tab-complete.c > > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end) > > COMPLETE_WITH("binary", "connect", "copy_data", > > "create_slot", > > "disable_on_error", > > "enabled", "failover", "origin", > > "password_required", > > "run_as_owner", "slot_name", > > - "streaming", > > "synchronous_commit", "two_phase"); > > + "streaming", > > "synchronous_commit", "two_phase","include_generated_columns"); > > > > 2) This small data table need not have a primary key column as it will > > create an index and insertion will happen in the index too. > > +-- check include-generated-columns option with generated column > > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > > AS (a * 2) STORED); > > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include-generated-columns', '1'); > > Fixed. > > > 3) Please add a test case for this: > > + set to false. If the subscriber-side > > column is also a > > + generated column then this option has no effect; the > > replicated data will > > + be ignored and the subscriber column will be filled as > > normal with the > > + subscriber-side computed or default data. > > Added the required test case. > > > 4) You can use a new style of ereport to remove the brackets around errcode > > 4.a) > > + else if (!parse_bool(strVal(elem->arg), > > >include_generated_columns)) > > + ereport(ERROR, > > + > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > +errmsg("could not > > parse value \"%s\" for parameter \"%s\"", > > + > > strVal(elem->arg), elem->defname))); > > > > 4.b) similarly here too: > > + ereport(ERROR, > > + (errcode(ERRCODE_SYNTAX_ERROR), > > + /*- translator: both %s are strings of the form > > "option = value" */ > > + errmsg("%s and %s are mutually > > exclusive options", > > + "copy_data = true", > > "include_generated_column = true"))); > > > > 4.c) similarly here too: > > + if (include_generated_columns_option_given) > > + ereport(ERROR, > > + > > (errcode(ERRCODE_SYNTAX_ERROR), > > +errmsg("conflicting > > or redundant options"))); > > Fixed. > > > 5) These variable names can be changed to keep it smaller, something > > like gencol or generatedcol or gencolumn, etc > > +++ b/src/include/catalog/pg_subscription.h > > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > > BKI_SHARED_RELATION BKI_ROW > > * slots) in the upstream database are enabled > > * to be synchronized to the standbys. */ > > > > + bool subincludegeneratedcolumn; /* True if generated columns must be > > published */ > > + > > #ifdef CATALOG_VARLEN /* variable-length fields start here */ > > /* Connection string to the publisher */ > > text subconninfo BKI_FORCE_NOT_NULL; > > @@ -157,6 +159,7 @@ typedef struct Subscription > > List*publications; /* List of publication names to subscribe to */ > > char*origin; /* Only publish data originating from the > > * specified origin */ > > + bool includegeneratedcolumn; /* publish generated column data */ > > } Subscription; > > Fixed. > > The attached Patch contains the suggested changes. Few comments: 1) Here tab1 and tab2 are exactly the same tables, just check if the table tab1 itself can be used for your tests.
Re: Pgoutput not capturing the generated columns
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna wrote: > > The attached Patch contains the suggested changes. > Hi Shubham, thanks for providing a patch. I have some comments for v6-0001. 1. create_subscription.sgml There is repetition of the same line. + + Specifies whether the generated columns present in the tables + associated with the subscription should be replicated. If the + subscriber-side column is also a generated column then this option + has no effect; the replicated data will be ignored and the subscriber + column will be filled as normal with the subscriber-side computed or + default data. + false. + + + + This parameter can only be set true if copy_data is + set to false. If the subscriber-side column is also a + generated column then this option has no effect; the replicated data will + be ignored and the subscriber column will be filled as normal with the + subscriber-side computed or default data. + == 2. subscriptioncmds.c 2a. The macro name should be in uppercase. We can use a short name like 'SUBOPT_INCLUDE_GEN_COL'. Thought? +#define SUBOPT_include_generated_columns 0x0001 2b.Update macro name accordingly + if (IsSet(supported_opts, SUBOPT_include_generated_columns)) + opts->include_generated_columns = false; 2c. Update macro name accordingly + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) && + strcmp(defel->defname, "include_generated_columns") == 0) + { + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns)) + errorConflictingDefElem(defel, pstate); + + opts->specified_opts |= SUBOPT_include_generated_columns; + opts->include_generated_columns = defGetBoolean(defel); + } 2d. Update macro name accordingly + SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN | + SUBOPT_include_generated_columns); == 3. decoding_into_rel.out 3a. In comment, I think it should be "When 'include-generated-columns' = '1' the generated column 'b' values will be replicated" +-- When 'include-generated-columns' = '1' the generated column 'b' values will not be replicated +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); +data +- + BEGIN + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 + COMMIT 3b. In comment, I think it should be "When 'include-generated-columns' = '1' the generated column 'b' values will not be replicated" +-- When 'include-generated-columns' = '0' the generated column 'b' values will be replicated +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '0'); + data + + BEGIN + table public.gencoltable: INSERT: a[integer]:4 + table public.gencoltable: INSERT: a[integer]:5 + table public.gencoltable: INSERT: a[integer]:6 + COMMIT +(5 rows) = 4. Here names for both the tests are the same. I think we should use different names. +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2"); +is( $result, qq(4|8 +5|10), 'generated columns replicated to non-generated column on subscriber'); + +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)"); + +$node_publisher->wait_for_catchup('sub3'); + +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3"); +is( $result, qq(4|24 +5|25), 'generated columns replicated to non-generated column on subscriber'); Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Thu, 6 Jun 2024 at 08:29, Hayato Kuroda (Fujitsu) wrote: > > Dear Shlok and Shubham, > > Thanks for updating the patch! > > I briefly checked the v5-0002. IIUC, your patch allows to copy generated > columns unconditionally. I think the behavior affects many people so that it > is > hard to get agreement. > > Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is > set > to off, we can keep the current specification. > > Thought? Hi Kuroda-san, I agree that we should not allow to copy generated columns unconditionally. With patch v7-0002, I have used a different approach which does not require any code changes in COPY. Please refer [1] for patch v7-0002. [1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Wed, 5 Jun 2024 at 05:49, Peter Smith wrote: > > On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal wrote: > > > > > > > > The attached Patch contains the suggested changes. > > > > > > > Hi, > > > > Currently, COPY command does not work for generated columns and > > therefore, COPY of generated column is not supported during tablesync > > process. So, in patch v4-0001 we added a check to allow replication of > > the generated column only if 'copy_data = false'. > > > > I am attaching patches to resolve the above issues. > > > > v5-0001: not changed > > v5-0002: Support COPY of generated column > > v5-0003: Support COPY of generated column during tablesync process > > > > Hi Shlok, I have a question about patch v5-0003. > > According to the patch 0001 docs "If the subscriber-side column is > also a generated column then this option has no effect; the replicated > data will be ignored and the subscriber column will be filled as > normal with the subscriber-side computed or default data". > > Doesn't this mean it will be a waste of effort/resources to COPY any > column value where the subscriber-side column is generated since we > know that any copied value will be ignored anyway? > > But I don't recall seeing any comment or logic for this kind of copy > optimisation in the patch 0003. Is this already accounted for > somewhere and I missed it, or is my understanding wrong? Your understanding is correct. With v7-0002, if a subscriber-side column is generated, then we do not include that column in the column list during COPY. This will address the above issue. Patch 7-0002 contains all the changes. Please refer [1] [1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Tue, 4 Jun 2024 at 15:01, Peter Smith wrote: > > Hi, > > Here are some review comments for patch v5-0003. > > == > 0. Whitespace warnings when the patch was applied. > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply > ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch > ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29: > trailing whitespace. > has no effect; the replicated data will be ignored and the > subscriber > ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30: > trailing whitespace. > column will be filled as normal with the subscriber-side computed or > ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189: > trailing whitespace. > (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && > warning: 3 lines add whitespace errors. > Fixed > == > src/backend/commands/subscriptioncmds.c > > 1. > - res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow); > + column_count = (!include_generated_column && check_gen_col) ? 4 : > (check_columnlist ? 3 : 2); > + res = walrcv_exec(wrconn, cmd.data, column_count, tableRow); > > The 'column_count' seems out of control. Won't it be far simpler to > assign/increment the value dynamically only as required instead of the > tricky calculation at the end which is unnecessarily difficult to > understand? > I have removed this piece of code. > ~~~ > > 2. > + /* > + * If include_generated_column option is false and all the column of > the table in the > + * publication are generated then we should throw an error. > + */ > + if (!isnull && !include_generated_column && check_gen_col) > + { > + attlist = DatumGetArrayTypeP(attlistdatum); > + gen_col_count = DatumGetInt32(slot_getattr(slot, 4, )); > + Assert(!isnull); > + > + attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist)); > + > + if (attcount != 0 && attcount == gen_col_count) > + ereport(ERROR, > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot use only generated column for table \"%s.%s\" in > publication when generated_column option is false", > +nspname, relname)); > + } > + > > Why do you think this new logic/error is necessary? > > IIUC the 'include_generated_columns' should be false to match the > existing HEAD behavior. So this scenario where your publisher-side > table *only* has generated columns is something that could already > happen, right? IOW, this introduced error could be a candidate for > another discussion/thread/patch, but is it really required for this > current patch? > Yes, this scenario can also happen in HEAD. For this patch I have removed this check. > == > src/backend/replication/logical/tablesync.c > > 3. > lrel->remoteid, > - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ? > - "AND a.attgenerated = ''" : ""), > + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && > + (walrcv_server_version(LogRepWorkerWalRcvConn) <= 16 || > + !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""), > > This ternary within one big appendStringInfo seems quite complicated. > Won't it be better to split the appendStringInfo into multiple parts > so the generated-cols calculation can be done more simply? > Fixed > == > src/test/subscription/t/011_generated.pl > > 4. > I think there should be a variety of different tablesync scenarios > (when 'include_generated_columns' is true) tested here instead of just > one, and all varieties with lots of comments to say what they are > doing, expectations etc. > > a. publisher-side gen-col "a" replicating to subscriber-side NOT > gen-col "a" (ok, value gets replicated) > b. publisher-side gen-col "a" replicating to subscriber-side gen-col > (ok, but ignored) > c. publisher-side NOT gen-col "b" replicating to subscriber-side > gen-col "b" (error?) > Added the tests > ~~ > > 5. > +$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3"); > +is( $result, qq(1|2 > +2|4 > +3|6), 'generated columns initial sync with include_generated_column = true'); > > Should this say "ORDER BY..." so it will not fail if the row order > happens to be something unanticipated? > Fixed > == > > 99. > Also, see the attached file with numerous other nitpicks: > - plural param- and var-names > - typos in comments > - missing spaces > - SQL keyword should be UPPERCASE > - etc. > > Please apply any/all of these if you agree with them. Fixed Patch 7-0002 contains all the changes. Please refer [1] [1]: https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Pgoutput not capturing the generated columns
On Tue, 4 Jun 2024 at 10:21, Peter Smith wrote: > > Hi, > > Here are some review comments for patch v5-0002. > > == > GENERAL > > G1. > IIUC now you are unconditionally allowing all generated columns to be copied. > > I think this is assuming that the table sync code (in the next patch > 0003?) is going to explicitly name all the columns it wants to copy > (so if it wants to get generated cols then it will name the generated > cols, and if is doesn't want generated cols then it won't name them). > > Maybe that is OK for the logical replication tablesync case, but I am > not sure if it will be desirable to *always* copy generated columns in > other user scenarios. > > e.g. I was wondering if there should be a new COPY command option > introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so > then the current HEAD behaviour is unaffected unless that option is > enabled. > > ~~~ > > G2. > The current COPY command documentation [1] says "If no column list is > specified, all columns of the table except generated columns will be > copied." > > But this 0002 patch has changed that documented behaviour, and so the > documentation needs to be changed as well, right? > > == > Commit Message > > 1. > Currently COPY command do not copy generated column. With this commit > added support for COPY for generated column. > > ~ > > The grammar/cardinality is not good here. Try some tool (Grammarly or > chatGPT, etc) to help correct it. > > == > src/backend/commands/copy.c > > == > src/test/regress/expected/generated.out > > == > src/test/regress/sql/generated.sql > > 2. > I think these COPY test cases require some explicit comments to > describe what they are doing, and what are the expected results. > > Currently, I have doubts about some of this test input/output > > e.g.1. Why is the 'b' column sometimes specified as 1? It needs some > explanation. Are you expecting this generated col value to be > ignored/overwritten or what? > > COPY gtest1 (a, b) FROM stdin DELIMITER ' '; > 5 1 > 6 1 > \. > > e.g.2. what is the reason for this new 'missing data for column "b"' > error? Or is it some introduced quirk because "b" now cannot be > generated since there is no value for "a"? I don't know if the > expected *.out here is OK or not, so some test comments may help to > clarify it. > > == > [1] https://www.postgresql.org/docs/devel/sql-copy.html > Hi Peter, I have removed the changes in the COPY command. I came up with an approach which requires changes only in tablesync code. We can COPY generated columns during tablesync using syntax 'COPY (SELECT column_name from table) TO STDOUT.' I have attached the patch for the same. v7-0001 : Not Modified v7-0002: Support replication of generated columns during initial sync. Thanks and Regards, Shlok Kyal v7-0002-Support-replication-of-generated-column-during-in.patch Description: Binary data v7-0001-Enable-support-for-include_generated_columns-opti.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
On Tue, Jun 4, 2024 at 8:12 AM Peter Smith wrote: > > Hi, > > Here are some review comments for patch v5-0001. > > == > GENERAL G.1 > > The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g. > maybe before they were always ignored, but now they are not? > > OTOH, when 'include_generated_columns' is false then the PUBLICATION > col-list will ignore any generated cols even when they are present in > a PUBLICATION col-list, right? > > These kinds of points should be noted in the commit message and in the > (col-list?) documentation. Fixed. > == > Commit message > > General 1a. > IMO the commit message needs some background to say something like: > "Currently generated column values are not replicated because it is > assumed that the corresponding subscriber-side table will generate its > own values for those columns." > > ~ > > General 1b. > Somewhere in this commit message, you need to give all the other > special rules --- e.g. the docs says "If the subscriber-side column is > also a generated column then this option has no effect" > > ~~~ Fixed. > 2. > This commit enables support for the 'include_generated_columns' option > in logical replication, allowing the transmission of generated column > information and data alongside regular table changes. This option is > particularly useful for scenarios where applications require access to > generated column values for downstream processing or synchronization. > > ~ > > I don't think the sentence "This option is particularly useful..." is > helpful. It seems like just saying "This commit supports option XXX. > This is particularly useful if you want XXX". > Fixed. > > 3. > CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= > 'publication pub1; > > ~ > > What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of > the new parameter being used in it? > Added the description for this in the Patch. > > 4. > Currently copy_data option with include_generated_columns option is > not supported. A future patch will remove this limitation. > > ~ > > Suggest to single-quote those parameter names for better readability. > Fixed. > > 5. > This commit aims to enhance the flexibility and utility of logical > replication by allowing users to include generated column information > in replication streams, paving the way for more robust data > synchronization and processing workflows. > > ~ > > IMO this paragraph can be omitted. Fixed. > == > .../test_decoding/sql/decoding_into_rel.sql > > 6. > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); > +INSERT INTO gencoltable (a) VALUES (4), (5), (6); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '0'); > +DROP TABLE gencoltable; > + > > 6a. > I felt some additional explicit comments might help the readabilty of > the output file. > > e.g.1 > -- When 'include-generated=columns' = '1' the generated column 'b' > values will be replicated > SELECT data FROM pg_logical_slot_get_changes... > > e.g.2 > -- When 'include-generated=columns' = '0' the generated column 'b' > values will not be replicated > SELECT data FROM pg_logical_slot_get_changes... Added the required description for this. > 6b. > Suggest adding one more test case (where 'include-generated=columns' > is not set) to confirm/demonstrate the default behaviour for > replicated generated cols. Added the required Test case. > == > doc/src/sgml/protocol.sgml > > 7. > + > + class="parameter">include-generated-columns > + > + > +Boolean option to enable generated columns. > +The include-generated-columns option controls whether generated > +columns should be included in the string representation of tuples > +during logical decoding in PostgreSQL. This allows users to > +customize the output format based on whether they want to include > +these columns or not. The default is false. > + > + > + > > 7a. > It doesn't render properly. e.g. Should not be bold italic (probably > the class is wrong?), because none of the nearby parameters look this > way. > > ~ > > 7b. > The name here should NOT have hyphens. It needs underscores same as > all other nearby protocol parameters. > > ~ > > 7c. > The description seems overly verbose. > > SUGGESTION > Boolean option to enable generated columns. This option controls > whether generated columns should be included in the string > representation of tuples during logical decoding in PostgreSQL. The > default is false. Fixed. > == >
Re: Pgoutput not capturing the generated columns
> Thanks for the updated patch, few comments: > 1) The option name seems wrong here: > In one place include_generated_column is specified and other place > include_generated_columns is specified: > > + else if (IsSet(supported_opts, > SUBOPT_INCLUDE_GENERATED_COLUMN) && > +strcmp(defel->defname, > "include_generated_column") == 0) > + { > + if (IsSet(opts->specified_opts, > SUBOPT_INCLUDE_GENERATED_COLUMN)) > + errorConflictingDefElem(defel, pstate); > + > + opts->specified_opts |= > SUBOPT_INCLUDE_GENERATED_COLUMN; > + opts->include_generated_column = defGetBoolean(defel); > + } Fixed. > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c > index d453e224d9..e8ff752fd9 100644 > --- a/src/bin/psql/tab-complete.c > +++ b/src/bin/psql/tab-complete.c > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end) > COMPLETE_WITH("binary", "connect", "copy_data", "create_slot", > "disable_on_error", > "enabled", "failover", "origin", > "password_required", > "run_as_owner", "slot_name", > - "streaming", > "synchronous_commit", "two_phase"); > + "streaming", > "synchronous_commit", "two_phase","include_generated_columns"); > > 2) This small data table need not have a primary key column as it will > create an index and insertion will happen in the index too. > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include-generated-columns', '1'); Fixed. > 3) Please add a test case for this: > + set to false. If the subscriber-side > column is also a > + generated column then this option has no effect; the > replicated data will > + be ignored and the subscriber column will be filled as > normal with the > + subscriber-side computed or default data. Added the required test case. > 4) You can use a new style of ereport to remove the brackets around errcode > 4.a) > + else if (!parse_bool(strVal(elem->arg), > >include_generated_columns)) > + ereport(ERROR, > + > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > +errmsg("could not > parse value \"%s\" for parameter \"%s\"", > + > strVal(elem->arg), elem->defname))); > > 4.b) similarly here too: > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + /*- translator: both %s are strings of the form > "option = value" */ > + errmsg("%s and %s are mutually > exclusive options", > + "copy_data = true", > "include_generated_column = true"))); > > 4.c) similarly here too: > + if (include_generated_columns_option_given) > + ereport(ERROR, > + > (errcode(ERRCODE_SYNTAX_ERROR), > +errmsg("conflicting > or redundant options"))); Fixed. > 5) These variable names can be changed to keep it smaller, something > like gencol or generatedcol or gencolumn, etc > +++ b/src/include/catalog/pg_subscription.h > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) > BKI_SHARED_RELATION BKI_ROW > * slots) in the upstream database are enabled > * to be synchronized to the standbys. */ > > + bool subincludegeneratedcolumn; /* True if generated columns must be > published */ > + > #ifdef CATALOG_VARLEN /* variable-length fields start here */ > /* Connection string to the publisher */ > text subconninfo BKI_FORCE_NOT_NULL; > @@ -157,6 +159,7 @@ typedef struct Subscription > List*publications; /* List of publication names to subscribe to */ > char*origin; /* Only publish data originating from the > * specified origin */ > + bool includegeneratedcolumn; /* publish generated column data */ > } Subscription; Fixed. The attached Patch contains the suggested changes. Thanks and Regards, Shubham Khanna. v6-0001-Enable-support-for-include_generated_columns-opti.patch Description: Binary data
RE: Pgoutput not capturing the generated columns
Dear Shlok and Shubham, Thanks for updating the patch! I briefly checked the v5-0002. IIUC, your patch allows to copy generated columns unconditionally. I think the behavior affects many people so that it is hard to get agreement. Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is set to off, we can keep the current specification. Thought? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Pgoutput not capturing the generated columns
On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal wrote: > > > > > The attached Patch contains the suggested changes. > > > > Hi, > > Currently, COPY command does not work for generated columns and > therefore, COPY of generated column is not supported during tablesync > process. So, in patch v4-0001 we added a check to allow replication of > the generated column only if 'copy_data = false'. > > I am attaching patches to resolve the above issues. > > v5-0001: not changed > v5-0002: Support COPY of generated column > v5-0003: Support COPY of generated column during tablesync process > Hi Shlok, I have a question about patch v5-0003. According to the patch 0001 docs "If the subscriber-side column is also a generated column then this option has no effect; the replicated data will be ignored and the subscriber column will be filled as normal with the subscriber-side computed or default data". Doesn't this mean it will be a waste of effort/resources to COPY any column value where the subscriber-side column is generated since we know that any copied value will be ignored anyway? But I don't recall seeing any comment or logic for this kind of copy optimisation in the patch 0003. Is this already accounted for somewhere and I missed it, or is my understanding wrong? == Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi, Here are some review comments for patch v5-0003. == 0. Whitespace warnings when the patch was applied. [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29: trailing whitespace. has no effect; the replicated data will be ignored and the subscriber ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30: trailing whitespace. column will be filled as normal with the subscriber-side computed or ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189: trailing whitespace. (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && warning: 3 lines add whitespace errors. == src/backend/commands/subscriptioncmds.c 1. - res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow); + column_count = (!include_generated_column && check_gen_col) ? 4 : (check_columnlist ? 3 : 2); + res = walrcv_exec(wrconn, cmd.data, column_count, tableRow); The 'column_count' seems out of control. Won't it be far simpler to assign/increment the value dynamically only as required instead of the tricky calculation at the end which is unnecessarily difficult to understand? ~~~ 2. + /* + * If include_generated_column option is false and all the column of the table in the + * publication are generated then we should throw an error. + */ + if (!isnull && !include_generated_column && check_gen_col) + { + attlist = DatumGetArrayTypeP(attlistdatum); + gen_col_count = DatumGetInt32(slot_getattr(slot, 4, )); + Assert(!isnull); + + attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist)); + + if (attcount != 0 && attcount == gen_col_count) + ereport(ERROR, + errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot use only generated column for table \"%s.%s\" in publication when generated_column option is false", +nspname, relname)); + } + Why do you think this new logic/error is necessary? IIUC the 'include_generated_columns' should be false to match the existing HEAD behavior. So this scenario where your publisher-side table *only* has generated columns is something that could already happen, right? IOW, this introduced error could be a candidate for another discussion/thread/patch, but is it really required for this current patch? == src/backend/replication/logical/tablesync.c 3. lrel->remoteid, - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ? - "AND a.attgenerated = ''" : ""), + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 && + (walrcv_server_version(LogRepWorkerWalRcvConn) <= 16 || + !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""), This ternary within one big appendStringInfo seems quite complicated. Won't it be better to split the appendStringInfo into multiple parts so the generated-cols calculation can be done more simply? == src/test/subscription/t/011_generated.pl 4. I think there should be a variety of different tablesync scenarios (when 'include_generated_columns' is true) tested here instead of just one, and all varieties with lots of comments to say what they are doing, expectations etc. a. publisher-side gen-col "a" replicating to subscriber-side NOT gen-col "a" (ok, value gets replicated) b. publisher-side gen-col "a" replicating to subscriber-side gen-col (ok, but ignored) c. publisher-side NOT gen-col "b" replicating to subscriber-side gen-col "b" (error?) ~~ 5. +$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3"); +is( $result, qq(1|2 +2|4 +3|6), 'generated columns initial sync with include_generated_column = true'); Should this say "ORDER BY..." so it will not fail if the row order happens to be something unanticipated? == 99. Also, see the attached file with numerous other nitpicks: - plural param- and var-names - typos in comments - missing spaces - SQL keyword should be UPPERCASE - etc. Please apply any/all of these if you agree with them. == Kind Regards, Peter Smith. Fujitsu Australia diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index 3e78a75..afb24c3 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -103,7 +103,7 @@ typedef struct SubOpts boolinclude_generated_column; } SubOpts; -static List *fetch_table_list(WalReceiverConn *wrconn, List *publications, bool include_generated_column); +static List *fetch_table_list(WalReceiverConn *wrconn, List *publications, bool include_generated_columns); static void check_publications_origin(WalReceiverConn *wrconn, List *publications, bool copydata, char *origin, Oid *subrel_local_oids, @@ -2147,7 +2147,7 @@
Re: Pgoutput not capturing the generated columns
Hi, Here are some review comments for patch v5-0002. == GENERAL G1. IIUC now you are unconditionally allowing all generated columns to be copied. I think this is assuming that the table sync code (in the next patch 0003?) is going to explicitly name all the columns it wants to copy (so if it wants to get generated cols then it will name the generated cols, and if is doesn't want generated cols then it won't name them). Maybe that is OK for the logical replication tablesync case, but I am not sure if it will be desirable to *always* copy generated columns in other user scenarios. e.g. I was wondering if there should be a new COPY command option introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so then the current HEAD behaviour is unaffected unless that option is enabled. ~~~ G2. The current COPY command documentation [1] says "If no column list is specified, all columns of the table except generated columns will be copied." But this 0002 patch has changed that documented behaviour, and so the documentation needs to be changed as well, right? == Commit Message 1. Currently COPY command do not copy generated column. With this commit added support for COPY for generated column. ~ The grammar/cardinality is not good here. Try some tool (Grammarly or chatGPT, etc) to help correct it. == src/backend/commands/copy.c == src/test/regress/expected/generated.out == src/test/regress/sql/generated.sql 2. I think these COPY test cases require some explicit comments to describe what they are doing, and what are the expected results. Currently, I have doubts about some of this test input/output e.g.1. Why is the 'b' column sometimes specified as 1? It needs some explanation. Are you expecting this generated col value to be ignored/overwritten or what? COPY gtest1 (a, b) FROM stdin DELIMITER ' '; 5 1 6 1 \. e.g.2. what is the reason for this new 'missing data for column "b"' error? Or is it some introduced quirk because "b" now cannot be generated since there is no value for "a"? I don't know if the expected *.out here is OK or not, so some test comments may help to clarify it. == [1] https://www.postgresql.org/docs/devel/sql-copy.html Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
Hi, Here are some review comments for patch v5-0001. == GENERAL G.1 The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g. maybe before they were always ignored, but now they are not? OTOH, when 'include_generated_columns' is false then the PUBLICATION col-list will ignore any generated cols even when they are present in a PUBLICATION col-list, right? These kinds of points should be noted in the commit message and in the (col-list?) documentation. == Commit message General 1a. IMO the commit message needs some background to say something like: "Currently generated column values are not replicated because it is assumed that the corresponding subscriber-side table will generate its own values for those columns." ~ General 1b. Somewhere in this commit message, you need to give all the other special rules --- e.g. the docs says "If the subscriber-side column is also a generated column then this option has no effect" ~~~ 2. This commit enables support for the 'include_generated_columns' option in logical replication, allowing the transmission of generated column information and data alongside regular table changes. This option is particularly useful for scenarios where applications require access to generated column values for downstream processing or synchronization. ~ I don't think the sentence "This option is particularly useful..." is helpful. It seems like just saying "This commit supports option XXX. This is particularly useful if you want XXX". ~~~ 3. CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port= 'publication pub1; ~ What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of the new parameter being used in it? ~~~ 4. Currently copy_data option with include_generated_columns option is not supported. A future patch will remove this limitation. ~ Suggest to single-quote those parameter names for better readability. ~~~ 5. This commit aims to enhance the flexibility and utility of logical replication by allowing users to include generated column information in replication streams, paving the way for more robust data synchronization and processing workflows. ~ IMO this paragraph can be omitted. == .../test_decoding/sql/decoding_into_rel.sql 6. +-- check include-generated-columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); +INSERT INTO gencoltable (a) VALUES (4), (5), (6); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '0'); +DROP TABLE gencoltable; + 6a. I felt some additional explicit comments might help the readabilty of the output file. e.g.1 -- When 'include-generated=columns' = '1' the generated column 'b' values will be replicated SELECT data FROM pg_logical_slot_get_changes... e.g.2 -- When 'include-generated=columns' = '0' the generated column 'b' values will not be replicated SELECT data FROM pg_logical_slot_get_changes... ~~ 6b. Suggest adding one more test case (where 'include-generated=columns' is not set) to confirm/demonstrate the default behaviour for replicated generated cols. == doc/src/sgml/protocol.sgml 7. + + include-generated-columns + + +Boolean option to enable generated columns. +The include-generated-columns option controls whether generated +columns should be included in the string representation of tuples +during logical decoding in PostgreSQL. This allows users to +customize the output format based on whether they want to include +these columns or not. The default is false. + + + 7a. It doesn't render properly. e.g. Should not be bold italic (probably the class is wrong?), because none of the nearby parameters look this way. ~ 7b. The name here should NOT have hyphens. It needs underscores same as all other nearby protocol parameters. ~ 7c. The description seems overly verbose. SUGGESTION Boolean option to enable generated columns. This option controls whether generated columns should be included in the string representation of tuples during logical decoding in PostgreSQL. The default is false. == doc/src/sgml/ref/create_subscription.sgml 8. + + +include_generated_column (boolean) + + + Specifies whether the generated columns present in the tables + associated with the subscription should be replicated. The default is + false. + The parameter name should be plural (include_generated_columns). == src/backend/commands/subscriptioncmds.c 9. #define SUBOPT_ORIGIN 0x8000 +#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x0001
Re: Pgoutput not capturing the generated columns
> > The attached Patch contains the suggested changes. > Hi, Currently, COPY command does not work for generated columns and therefore, COPY of generated column is not supported during tablesync process. So, in patch v4-0001 we added a check to allow replication of the generated column only if 'copy_data = false'. I am attaching patches to resolve the above issues. v5-0001: not changed v5-0002: Support COPY of generated column v5-0003: Support COPY of generated column during tablesync process Thought? Thanks and Regards, Shlok Kyal v5-0001-Enable-support-for-include_generated_columns-opti.patch Description: Binary data v5-0002-Support-COPY-command-for-generated-columns.patch Description: Binary data v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
On Mon, 3 Jun 2024 at 13:03, Shubham Khanna wrote: > > On Thu, May 16, 2024 at 11:35 AM Peter Smith wrote: > > > > Here are some review comments for the patch v1-0001. > > > > == > > GENERAL > > > > G.1. Use consistent names > > > > It seems to add unnecessary complications by having different names > > for all the new options, fields and API parameters. > > > > e.g. sometimes 'include_generated_columns' > > e.g. sometimes 'publish_generated_columns' > > > > Won't it be better to just use identical names everywhere for > > everything? I don't mind which one you choose; I just felt you only > > need one name, not two. This comment overrides everything else in this > > post so whatever name you choose, make adjustments for all my other > > review comments as necessary. > > I have updated the name to 'include_generated_columns' everywhere in the > Patch. > > > == > > > > G.2. Is it possible to just use the existing bms? > > > > A very large part of this patch is adding more API parameters to > > delegate the 'publish_generated_columns' flag value down to when it is > > finally checked and used. e.g. > > > > The functions: > > - logicalrep_write_insert(), logicalrep_write_update(), > > logicalrep_write_delete() > > ... are delegating the new parameter 'publish_generated_column' down to: > > - logicalrep_write_tuple > > > > The functions: > > - logicalrep_write_rel() > > ... are delegating the new parameter 'publish_generated_column' down to: > > - logicalrep_write_attrs > > > > AFAICT in all these places the API is already passing a "Bitmapset > > *columns". I was wondering if it might be possible to modify the > > "Bitmapset *columns" BEFORE any of those functions get called so that > > the "columns" BMS either does or doesn't include generated cols (as > > appropriate according to the option). > > > > Well, it might not be so simple because there are some NULL BMS > > considerations also, but I think it would be worth investigating at > > least, because if indeed you can find some common place (somewhere > > like pgoutput_change()?) where the columns BMS can be filtered to > > remove bits for generated cols then it could mean none of those other > > patch API changes are needed at all -- then the patch would only be > > 1/2 the size. > > I will analyse and reply to this in the next version. > > > == > > Commit message > > > > 1. > > Now if include_generated_columns option is specified, the generated > > column information and generated column data also will be sent. > > > > Usage from pgoutput plugin: > > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > > 'proto_version', '1', 'publication_names', 'pub1', > > 'include_generated_columns', 'true'); > > > > Usage from test_decoding plugin: > > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > > 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > > > ~ > > > > I think there needs to be more background information given here. This > > commit message doesn't seem to describe anything about what is the > > problem and how this patch fixes it. It just jumps straight into > > giving usages of a 'include_generated_columns' option. > > > > It also doesn't say that this is an option that was newly *introduced* > > by the patch -- it refers to it as though the reader should already > > know about it. > > > > Furthermore, your hacker's post says "Currently it is not supported as > > a subscription option because table sync for the generated column is > > not possible as copy command does not support getting data for the > > generated column. If this feature is required we can remove this > > limitation from the copy command and then add it as a subscription > > option later." IMO that all seems like the kind of information that > > ought to also be mentioned in this commit message. > > I have updated the Commit message mentioning the suggested changes. > > > == > > contrib/test_decoding/sql/ddl.sql > > > > 2. > > +-- check include_generated_columns option with generated column > > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > > AS (a * 2) STORED); > > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > +DROP TABLE gencoltable; > > + > > > > 2a. > > Perhaps you should include both option values to demonstrate the > > difference in behaviour: > > > > 'include_generated_columns', '0' > > 'include_generated_columns', '1' > > Added the other option values to demonstrate the difference in behaviour: > > > 2b. > > I think you maybe need to include more some test combinations where > > there is and isn't a COLUMN LIST, because I am not 100% sure I > > understand the current logic/expectations for all combinations. > > > > e.g. When the generated column is in a column list but > >
Re: Pgoutput not capturing the generated columns
On Fri, May 24, 2024 at 8:26 AM Peter Smith wrote: > > Hi, > > Here are some review comments for the patch v3-0001. > > I don't think v3 addressed any of my previous review comments for > patches v1 and v2. [1][2] > > So the comments below are limited only to the new code (i.e. the v3 > versus v2 differences). Meanwhile, all my previous review comments may > still apply. Patch v4-0001 addresses the previous review comments for patches v1 and v2. [1][2] > == > GENERAL > > The patch applied gives whitespace warnings: > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply > ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch > ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150: > trailing whitespace. > > ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202: > trailing whitespace. > > ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730: > trailing whitespace. > warning: 3 lines add whitespace errors. Fixed. > == > contrib/test_decoding/test_decoding.c > > 1. pg_decode_change > > MemoryContext old; > + bool include_generated_columns; > + > > I'm not really convinced this variable saves any code. Fixed. > == > doc/src/sgml/protocol.sgml > > 2. > + > + class="parameter">include-generated-columns > + > + > +The include-generated-columns option controls whether > generated columns should be included in the string representation of > tuples during logical decoding in PostgreSQL. This allows users to > customize the output format based on whether they want to include > these columns or not. > + > + > + > > 2a. > Something is not correct when this name has hyphens and all the nearby > parameter names do not. Shouldn't it be all uppercase like the other > boolean parameter? > > ~ > > 2b. > Text in the SGML file should be wrapped properly. > > ~ > > 2c. > IMO the comment can be more terse and it also needs to specify that it > is a boolean type, and what is the default value if not passed. > > SUGGESTION > > INCLUDE_GENERATED_COLUMNS [ boolean ] > > If true, then generated columns should be included in the string > representation of tuples during logical decoding in PostgreSQL. The > default is false. Fixed. > == > src/backend/replication/logical/proto.c > > 3. logicalrep_write_tuple > > - if (!column_in_column_list(att->attnum, columns)) > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) > + continue; > + > + if (att->attgenerated && !publish_generated_column) > continue; > > 3a. > This code seems overcomplicated checking the same flag multiple times. > > SUGGESTION > if (att->attgenerated) > { > if (!publish_generated_column) > continue; > } > else > { > if (!column_in_column_list(att->attnum, columns)) > continue; > } > > ~ > > 3b. > The same logic occurs several times in logicalrep_write_tuple Fixed. > 4. logicalrep_write_attrs > > if (!column_in_column_list(att->attnum, columns)) > continue; > > + if (att->attgenerated && !publish_generated_column) > + continue; > + > > Shouldn't these code fragments (2x in this function) look the same as > in logicalrep_write_tuple? See the above review comments. Fixed. > == > src/backend/replication/pgoutput/pgoutput.c > > 5. maybe_send_schema > > TransactionId topxid = InvalidTransactionId; > + bool publish_generated_column = data->publish_generated_column; > > I'm not convinced this saves any code, and anyway, it is not > consistent with other fields in this function that are not extracted > to another variable (e.g. data->streaming). Fixed. > 6. pgoutput_change > - > + bool publish_generated_column = data->publish_generated_column; > + > > I'm not convinced this saves any code, and anyway, it is not > consistent with other fields in this function that are not extracted > to another variable (e.g. data->binary). Fixed. > == > [1] My v1 review - > https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6pe5uyjv_uxvrhu3ck7w2ahb1uqy...@mail.gmail.com > [2] My v2 review - > https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com Patch v4-0001 contains all the changes required. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
On Thu, May 23, 2024 at 5:56 PM vignesh C wrote: > > On Thu, 23 May 2024 at 09:19, Shubham Khanna > wrote: > > > > > Dear Shubham, > > > > > > Thanks for creating a patch! Here are high-level comments. > > > > > 1. > > > Please document the feature. If it is hard to describe, we should change > > > the API. > > > > I have added the feature in the document. > > > > > 4. > > > Regarding the test_decoding plugin, it has already been able to decode the > > > generated columns. So... as the first place, is the proposed option > > > really needed > > > for the plugin? Why do you include it? > > > If you anyway want to add the option, the default value should be on - > > > which keeps > > > current behavior. > > > > I have made the generated column options as true for test_decoding > > plugin so by default we will send generated column data. > > > > > 5. > > > Assuming that the feature become usable used for logical replicaiton. Not > > > sure, > > > should we change the protocol version at that time? Nodes prior than PG17 > > > may > > > not want receive values for generated columns. Can we control only by the > > > option? > > > > I verified the backward compatibility test by using the generated > > column option and it worked fine. I think there is no need to make any > > further changes. > > > > > 7. > > > > > > Some functions refer data->publish_generated_column many times. Can we > > > store > > > the value to a variable? > > > > > > Below comments are for test_decoding part, but they may be not needed. > > > > > > = > > > > > > a. pg_decode_startup() > > > > > > ``` > > > +else if (strcmp(elem->defname, "include_generated_columns") == 0) > > > ``` > > > > > > Other options for test_decoding do not have underscore. It should be > > > "include-generated-columns". > > > > > > b. pg_decode_change() > > > > > > data->include_generated_columns is referred four times in the function. > > > Can you store the value to a varibable? > > > > > > > > > c. pg_decode_change() > > > > > > ``` > > > -true); > > > +true, > > > data->include_generated_columns ); > > > ``` > > > > > > Please remove the blank. > > > > Fixed. > > The attached v3 Patch has the changes for the same. > > Few comments: > 1) Since this is removed, tupdesc variable is not required anymore: > +++ b/src/backend/catalog/pg_publication.c > @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel, > List *columns, > errmsg("cannot use system > column \"%s\" in publication column list", >colname)); > > - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated) > - ereport(ERROR, > - > errcode(ERRCODE_INVALID_COLUMN_REFERENCE), > - errmsg("cannot use generated > column \"%s\" in publication column list", > - colname)); Fixed. > 2) In test_decoding include_generated_columns option is used: > + else if (strcmp(elem->defname, > "include_generated_columns") == 0) > + { > + if (elem->arg == NULL) > + continue; > + else if (!parse_bool(strVal(elem->arg), > >include_generated_columns)) > + ereport(ERROR, > + > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > +errmsg("could not > parse value \"%s\" for parameter \"%s\"", > + > strVal(elem->arg), elem->defname))); > + } > > In subscription we have used generated_column, we can try to use the > same option in both places: > + else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) && > +strcmp(defel->defname, > "generated_column") == 0) > + { > + if (IsSet(opts->specified_opts, > SUBOPT_GENERATED_COLUMN)) > + errorConflictingDefElem(defel, pstate); > + > + opts->specified_opts |= SUBOPT_GENERATED_COLUMN; > + opts->generated_column = defGetBoolean(defel); > + } Will update the name to 'include_generated_columns' in the next version of the Patch. > 3) Tab completion can be added for create subscription to include > generated_column option Fixed. > 4) There are few whitespace issues while applying the patch, check for > git diff --check Fixed. > 5) Add few tests for the new option added Added new test cases. Patch v4-0001 contains all the changes required. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
On Thu, May 23, 2024 at 10:56 AM Hayato Kuroda (Fujitsu) wrote: > > Dear Shubham, > > Thanks for updating the patch! I checked your patches briefly. Here are my > comments. > > 01. API > > Since the option for test_decoding is enabled by default, I think it should > be renamed. > E.g., "skip-generated-columns" or something. Let's keep the same name 'include_generated_columns' for both the cases. > 02. ddl.sql > > ``` > +-- check include-generated-columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * > 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', > '1'); > +data > +- > + BEGIN > + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 > + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 > + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 > + COMMIT > +(5 rows) > ``` > > We should test non-default case, which the generated columns are not > generated. Added the non-default case, which the generated columns are not generated. > 03. ddl.sql > > Not sure new tests are in the correct place. Do we have to add new file and > move tests to it? > Thought? Added the new tests in the 'decoding_into_rel.out' file. > 04. protocol.sgml > > Please keep the format of the sgml file. Fixed. > 05. protocol.sgml > > The option is implemented as the streaming option of pgoutput plugin, so they > should be > located under "Logical Streaming Replication Parameters" section. Fixed. > 05. AlterSubscription > > ``` > + if (IsSet(opts.specified_opts, > SUBOPT_GENERATED_COLUMN)) > + { > + ereport(ERROR, > + > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > +errmsg("toggling > generated_column option is not allowed."))); > + } > ``` > > If you don't want to support the option, you can remove > SUBOPT_GENERATED_COLUMN > macro from the function. But can you clarify the reason why you do not want? Fixed. > 07. logicalrep_write_tuple > > ``` > - if (!column_in_column_list(att->attnum, columns)) > + if (!column_in_column_list(att->attnum, columns) && > !att->attgenerated) > + continue; > + > + if (att->attgenerated && !publish_generated_column) > continue; > ``` > > I think changes in v2 was reverted or wrongly merged. Fixed. > 08. test code > > Can you add tests that generated columns are replicated by the logical > replication? Added the test cases. Patch v4-0001 contains all the changes required. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
On Tue, May 21, 2024 at 12:23 PM Peter Smith wrote: > > Hi, > > AFAICT this v2-0001 patch differences from v1 is mostly about adding > the new CREATE SUBSCRIPTION option. Specifically, I don't think it is > addressing any of my previous review comments for patch v1. [1]. So > these comments below are limited only to the new option code; All my > previous review comments probably still apply. > > == > Commit message > > 1. (General) > The commit message is seriously lacking background explanation to describe: > - What is the current behaviour w.r.t. generated columns > - What is the problem with the current behaviour? > - What exactly is this patch doing to address that problem? Added the information related to this inside the Patch. > 2. > New option generated_option is added in create subscription. Now if this > option is specified as 'true' during create subscription, generated > columns in the tables, present in publisher (to which this subscription is > subscribed) can also be replicated. > > - > > 2A. > "generated_option" is not the name of the new option. > > ~ > > 2B. > "create subscription" stmt should be UPPERCASE; will also be more > readable if the option name is quoted. > > ~ > > 2C. > Needs more information like under what condition is this option ignored etc. Fixed. > == > doc/src/sgml/ref/create_subscription.sgml > > 3. > +id="sql-createsubscription-params-with-generated-column"> > +generated-column > (boolean) > + > + > + Specifies whether the generated columns present in the tables > + associated with the subscription should be replicated. The default > is > + false. > + > + > + > + This parameter can only be set true if copy_data is set to false. > + This option works fine when a generated column (in > publisher) is replicated to a > + non-generated column (in subscriber). Else if it is > replicated to a generated > + column, it will ignore the replicated data and fill the > column with computed or > + default data. > + > + > + > > 3A. > There is a typo in the name "generated-column" because we should use > underscores (not hyphens) for the option names. > > ~ > > 3B. > This it is not a good option name because there is no verb so it > doesn't mean anything to set it true/false -- actually there IS a verb > "generate" but we are not saying generate = true/false, so this name > is also quite confusing. > > I think "include_generated_columns" would be much better, but if > others think that name is too long then maybe "include_generated_cols" > or "include_gen_cols" or similar. Of course, whatever if the final > decision should be propagated same thru all the code comments, params, > fields, etc. > > ~ > > 3C. > copy_data and false should be marked up as fonts in the sgml > > ~ > > 3D. > > Suggest re-word this part. Don't need to explain when it "works fine". > > BEFORE > This option works fine when a generated column (in publisher) is > replicated to a non-generated column (in subscriber). Else if it is > replicated to a generated column, it will ignore the replicated data > and fill the column with computed or default data. > > SUGGESTION > If the subscriber-side column is also a generated column then this > option has no effect; the replicated data will be ignored and the > subscriber column will be filled as normal with the subscriber-side > computed or default data. Fixed. > == > src/backend/commands/subscriptioncmds.c > > 4. AlterSubscription > SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR | > SUBOPT_PASSWORD_REQUIRED | > SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | > - SUBOPT_ORIGIN); > + SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN); > > Hmm. Is this correct? If ALTER is not allowed (later in this patch > there is a message "toggling generated_column option is not allowed." > then why are we even saying that SUBOPT_GENERATED_COLUMN is a > support_opt for ALTER? Fixed. > 5. > + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN)) > + { > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("toggling generated_column option is not allowed."))); > + } > > 5A. > I suspect this is not even needed if the 'supported_opt' is fixed per > the previous comment. > > ~ > > 5B. > But if this message is still needed then I think it should say "ALTER > is not allowed" (not "toggling is not allowed") and also the option > name should be quoted as per the new guidelines for error messages. > > == > src/backend/replication/logical/proto.c Fixed. > 6. logicalrep_write_tuple > > - if (att->attisdropped || att->attgenerated) > + if (att->attisdropped) > continue; > > if (!column_in_column_list(att->attnum, columns)) > continue; > > + if (att->attgenerated && !publish_generated_column) > + > > Calling column_in_column_list() might be a more expensive operation > than checking just
Re: Pgoutput not capturing the generated columns
On Thu, May 16, 2024 at 11:35 AM Peter Smith wrote: > > Here are some review comments for the patch v1-0001. > > == > GENERAL > > G.1. Use consistent names > > It seems to add unnecessary complications by having different names > for all the new options, fields and API parameters. > > e.g. sometimes 'include_generated_columns' > e.g. sometimes 'publish_generated_columns' > > Won't it be better to just use identical names everywhere for > everything? I don't mind which one you choose; I just felt you only > need one name, not two. This comment overrides everything else in this > post so whatever name you choose, make adjustments for all my other > review comments as necessary. I have updated the name to 'include_generated_columns' everywhere in the Patch. > == > > G.2. Is it possible to just use the existing bms? > > A very large part of this patch is adding more API parameters to > delegate the 'publish_generated_columns' flag value down to when it is > finally checked and used. e.g. > > The functions: > - logicalrep_write_insert(), logicalrep_write_update(), > logicalrep_write_delete() > ... are delegating the new parameter 'publish_generated_column' down to: > - logicalrep_write_tuple > > The functions: > - logicalrep_write_rel() > ... are delegating the new parameter 'publish_generated_column' down to: > - logicalrep_write_attrs > > AFAICT in all these places the API is already passing a "Bitmapset > *columns". I was wondering if it might be possible to modify the > "Bitmapset *columns" BEFORE any of those functions get called so that > the "columns" BMS either does or doesn't include generated cols (as > appropriate according to the option). > > Well, it might not be so simple because there are some NULL BMS > considerations also, but I think it would be worth investigating at > least, because if indeed you can find some common place (somewhere > like pgoutput_change()?) where the columns BMS can be filtered to > remove bits for generated cols then it could mean none of those other > patch API changes are needed at all -- then the patch would only be > 1/2 the size. I will analyse and reply to this in the next version. > == > Commit message > > 1. > Now if include_generated_columns option is specified, the generated > column information and generated column data also will be sent. > > Usage from pgoutput plugin: > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > 'proto_version', '1', 'publication_names', 'pub1', > 'include_generated_columns', 'true'); > > Usage from test_decoding plugin: > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > > ~ > > I think there needs to be more background information given here. This > commit message doesn't seem to describe anything about what is the > problem and how this patch fixes it. It just jumps straight into > giving usages of a 'include_generated_columns' option. > > It also doesn't say that this is an option that was newly *introduced* > by the patch -- it refers to it as though the reader should already > know about it. > > Furthermore, your hacker's post says "Currently it is not supported as > a subscription option because table sync for the generated column is > not possible as copy command does not support getting data for the > generated column. If this feature is required we can remove this > limitation from the copy command and then add it as a subscription > option later." IMO that all seems like the kind of information that > ought to also be mentioned in this commit message. I have updated the Commit message mentioning the suggested changes. > == > contrib/test_decoding/sql/ddl.sql > > 2. > +-- check include_generated_columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > +DROP TABLE gencoltable; > + > > 2a. > Perhaps you should include both option values to demonstrate the > difference in behaviour: > > 'include_generated_columns', '0' > 'include_generated_columns', '1' Added the other option values to demonstrate the difference in behaviour: > 2b. > I think you maybe need to include more some test combinations where > there is and isn't a COLUMN LIST, because I am not 100% sure I > understand the current logic/expectations for all combinations. > > e.g. When the generated column is in a column list but > 'publish_generated_columns' is false then what should happen? etc. > Also if there are any special rules then those should be mentioned in > the commit message. Test case is added and the same is mentioned in the documentation. > == > src/backend/replication/logical/proto.c > > 3. > For all the API
Re: Pgoutput not capturing the generated columns
Hi, Here are some review comments for the patch v3-0001. I don't think v3 addressed any of my previous review comments for patches v1 and v2. [1][2] So the comments below are limited only to the new code (i.e. the v3 versus v2 differences). Meanwhile, all my previous review comments may still apply. == GENERAL The patch applied gives whitespace warnings: [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150: trailing whitespace. ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202: trailing whitespace. ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730: trailing whitespace. warning: 3 lines add whitespace errors. == contrib/test_decoding/test_decoding.c 1. pg_decode_change MemoryContext old; + bool include_generated_columns; + I'm not really convinced this variable saves any code. == doc/src/sgml/protocol.sgml 2. + + include-generated-columns + + +The include-generated-columns option controls whether generated columns should be included in the string representation of tuples during logical decoding in PostgreSQL. This allows users to customize the output format based on whether they want to include these columns or not. + + + 2a. Something is not correct when this name has hyphens and all the nearby parameter names do not. Shouldn't it be all uppercase like the other boolean parameter? ~ 2b. Text in the SGML file should be wrapped properly. ~ 2c. IMO the comment can be more terse and it also needs to specify that it is a boolean type, and what is the default value if not passed. SUGGESTION INCLUDE_GENERATED_COLUMNS [ boolean ] If true, then generated columns should be included in the string representation of tuples during logical decoding in PostgreSQL. The default is false. == src/backend/replication/logical/proto.c 3. logicalrep_write_tuple - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; 3a. This code seems overcomplicated checking the same flag multiple times. SUGGESTION if (att->attgenerated) { if (!publish_generated_column) continue; } else { if (!column_in_column_list(att->attnum, columns)) continue; } ~ 3b. The same logic occurs several times in logicalrep_write_tuple ~~~ 4. logicalrep_write_attrs if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; + Shouldn't these code fragments (2x in this function) look the same as in logicalrep_write_tuple? See the above review comments. == src/backend/replication/pgoutput/pgoutput.c 5. maybe_send_schema TransactionId topxid = InvalidTransactionId; + bool publish_generated_column = data->publish_generated_column; I'm not convinced this saves any code, and anyway, it is not consistent with other fields in this function that are not extracted to another variable (e.g. data->streaming). ~~~ 6. pgoutput_change - + bool publish_generated_column = data->publish_generated_column; + I'm not convinced this saves any code, and anyway, it is not consistent with other fields in this function that are not extracted to another variable (e.g. data->binary). == [1] My v1 review - https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6pe5uyjv_uxvrhu3ck7w2ahb1uqy...@mail.gmail.com [2] My v2 review - https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Pgoutput not capturing the generated columns
On Thu, 23 May 2024 at 09:19, Shubham Khanna wrote: > > > Dear Shubham, > > > > Thanks for creating a patch! Here are high-level comments. > > > 1. > > Please document the feature. If it is hard to describe, we should change > > the API. > > I have added the feature in the document. > > > 4. > > Regarding the test_decoding plugin, it has already been able to decode the > > generated columns. So... as the first place, is the proposed option really > > needed > > for the plugin? Why do you include it? > > If you anyway want to add the option, the default value should be on - > > which keeps > > current behavior. > > I have made the generated column options as true for test_decoding > plugin so by default we will send generated column data. > > > 5. > > Assuming that the feature become usable used for logical replicaiton. Not > > sure, > > should we change the protocol version at that time? Nodes prior than PG17 > > may > > not want receive values for generated columns. Can we control only by the > > option? > > I verified the backward compatibility test by using the generated > column option and it worked fine. I think there is no need to make any > further changes. > > > 7. > > > > Some functions refer data->publish_generated_column many times. Can we store > > the value to a variable? > > > > Below comments are for test_decoding part, but they may be not needed. > > > > = > > > > a. pg_decode_startup() > > > > ``` > > +else if (strcmp(elem->defname, "include_generated_columns") == 0) > > ``` > > > > Other options for test_decoding do not have underscore. It should be > > "include-generated-columns". > > > > b. pg_decode_change() > > > > data->include_generated_columns is referred four times in the function. > > Can you store the value to a varibable? > > > > > > c. pg_decode_change() > > > > ``` > > -true); > > +true, data->include_generated_columns > > ); > > ``` > > > > Please remove the blank. > > Fixed. > The attached v3 Patch has the changes for the same. Few comments: 1) Since this is removed, tupdesc variable is not required anymore: +++ b/src/backend/catalog/pg_publication.c @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel, List *columns, errmsg("cannot use system column \"%s\" in publication column list", colname)); - if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated) - ereport(ERROR, - errcode(ERRCODE_INVALID_COLUMN_REFERENCE), - errmsg("cannot use generated column \"%s\" in publication column list", - colname)); 2) In test_decoding include_generated_columns option is used: + else if (strcmp(elem->defname, "include_generated_columns") == 0) + { + if (elem->arg == NULL) + continue; + else if (!parse_bool(strVal(elem->arg), >include_generated_columns)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("could not parse value \"%s\" for parameter \"%s\"", + strVal(elem->arg), elem->defname))); + } In subscription we have used generated_column, we can try to use the same option in both places: + else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) && +strcmp(defel->defname, "generated_column") == 0) + { + if (IsSet(opts->specified_opts, SUBOPT_GENERATED_COLUMN)) + errorConflictingDefElem(defel, pstate); + + opts->specified_opts |= SUBOPT_GENERATED_COLUMN; + opts->generated_column = defGetBoolean(defel); + } 3) Tab completion can be added for create subscription to include generated_column option 4) There are few whitespace issues while applying the patch, check for git diff --check 5) Add few tests for the new option added Regards, Vignesh
RE: Pgoutput not capturing the generated columns
Dear Shubham, Thanks for updating the patch! I checked your patches briefly. Here are my comments. 01. API Since the option for test_decoding is enabled by default, I think it should be renamed. E.g., "skip-generated-columns" or something. 02. ddl.sql ``` +-- check include-generated-columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1'); +data +- + BEGIN + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2 + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4 + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6 + COMMIT +(5 rows) ``` We should test non-default case, which the generated columns are not generated. 03. ddl.sql Not sure new tests are in the correct place. Do we have to add new file and move tests to it? Thought? 04. protocol.sgml Please keep the format of the sgml file. 05. protocol.sgml The option is implemented as the streaming option of pgoutput plugin, so they should be located under "Logical Streaming Replication Parameters" section. 05. AlterSubscription ``` + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN)) + { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("toggling generated_column option is not allowed."))); + } ``` If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN macro from the function. But can you clarify the reason why you do not want? 07. logicalrep_write_tuple ``` - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; ``` I think changes in v2 was reverted or wrongly merged. 08. test code Can you add tests that generated columns are replicated by the logical replication? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Pgoutput not capturing the generated columns
> Dear Shubham, > > Thanks for creating a patch! Here are high-level comments. > 1. > Please document the feature. If it is hard to describe, we should change the > API. I have added the feature in the document. > 4. > Regarding the test_decoding plugin, it has already been able to decode the > generated columns. So... as the first place, is the proposed option really > needed > for the plugin? Why do you include it? > If you anyway want to add the option, the default value should be on - which > keeps > current behavior. I have made the generated column options as true for test_decoding plugin so by default we will send generated column data. > 5. > Assuming that the feature become usable used for logical replicaiton. Not > sure, > should we change the protocol version at that time? Nodes prior than PG17 may > not want receive values for generated columns. Can we control only by the > option? I verified the backward compatibility test by using the generated column option and it worked fine. I think there is no need to make any further changes. > 7. > > Some functions refer data->publish_generated_column many times. Can we store > the value to a variable? > > Below comments are for test_decoding part, but they may be not needed. > > = > > a. pg_decode_startup() > > ``` > +else if (strcmp(elem->defname, "include_generated_columns") == 0) > ``` > > Other options for test_decoding do not have underscore. It should be > "include-generated-columns". > > b. pg_decode_change() > > data->include_generated_columns is referred four times in the function. > Can you store the value to a varibable? > > > c. pg_decode_change() > > ``` > -true); > +true, data->include_generated_columns ); > ``` > > Please remove the blank. Fixed. The attached v3 Patch has the changes for the same. Thanks and Regards, Shubham Khanna. v3-0001-Support-generated-column-capturing-generated-colu.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
On 08.05.24 09:13, Shubham Khanna wrote: The attached patch has the changes to support capturing generated column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the ‘include_generated_columns’ option is specified, the generated column information and generated column data also will be sent. It might be worth keeping half an eye on the development of virtual generated columns [0]. I think it won't be possible to include those into the replication output stream. I think having an option for including stored generated columns is in general ok. [0]: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org
Re: Pgoutput not capturing the generated columns
Hi, AFAICT this v2-0001 patch differences from v1 is mostly about adding the new CREATE SUBSCRIPTION option. Specifically, I don't think it is addressing any of my previous review comments for patch v1. [1]. So these comments below are limited only to the new option code; All my previous review comments probably still apply. == Commit message 1. (General) The commit message is seriously lacking background explanation to describe: - What is the current behaviour w.r.t. generated columns - What is the problem with the current behaviour? - What exactly is this patch doing to address that problem? ~ 2. New option generated_option is added in create subscription. Now if this option is specified as 'true' during create subscription, generated columns in the tables, present in publisher (to which this subscription is subscribed) can also be replicated. - 2A. "generated_option" is not the name of the new option. ~ 2B. "create subscription" stmt should be UPPERCASE; will also be more readable if the option name is quoted. ~ 2C. Needs more information like under what condition is this option ignored etc. == doc/src/sgml/ref/create_subscription.sgml 3. + +generated-column (boolean) + + + Specifies whether the generated columns present in the tables + associated with the subscription should be replicated. The default is + false. + + + + This parameter can only be set true if copy_data is set to false. + This option works fine when a generated column (in publisher) is replicated to a + non-generated column (in subscriber). Else if it is replicated to a generated + column, it will ignore the replicated data and fill the column with computed or + default data. + + + 3A. There is a typo in the name "generated-column" because we should use underscores (not hyphens) for the option names. ~ 3B. This it is not a good option name because there is no verb so it doesn't mean anything to set it true/false -- actually there IS a verb "generate" but we are not saying generate = true/false, so this name is also quite confusing. I think "include_generated_columns" would be much better, but if others think that name is too long then maybe "include_generated_cols" or "include_gen_cols" or similar. Of course, whatever if the final decision should be propagated same thru all the code comments, params, fields, etc. ~ 3C. copy_data and false should be marked up as fonts in the sgml ~ 3D. Suggest re-word this part. Don't need to explain when it "works fine". BEFORE This option works fine when a generated column (in publisher) is replicated to a non-generated column (in subscriber). Else if it is replicated to a generated column, it will ignore the replicated data and fill the column with computed or default data. SUGGESTION If the subscriber-side column is also a generated column then this option has no effect; the replicated data will be ignored and the subscriber column will be filled as normal with the subscriber-side computed or default data. == src/backend/commands/subscriptioncmds.c 4. AlterSubscription SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED | SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | - SUBOPT_ORIGIN); + SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN); Hmm. Is this correct? If ALTER is not allowed (later in this patch there is a message "toggling generated_column option is not allowed." then why are we even saying that SUBOPT_GENERATED_COLUMN is a support_opt for ALTER? ~~~ 5. + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN)) + { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("toggling generated_column option is not allowed."))); + } 5A. I suspect this is not even needed if the 'supported_opt' is fixed per the previous comment. ~ 5B. But if this message is still needed then I think it should say "ALTER is not allowed" (not "toggling is not allowed") and also the option name should be quoted as per the new guidelines for error messages. == src/backend/replication/logical/proto.c 6. logicalrep_write_tuple - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + Calling column_in_column_list() might be a more expensive operation than checking just generated columns flag so maybe reverse the order and check the generated columns first for a tiny performance gain. ~~ 7. - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if (!column_in_column_list(att->attnum, columns)) continue; + if (att->attgenerated && !publish_generated_column) + continue; ditto #6 ~~ 8. logicalrep_write_attrs - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; if
Re: Pgoutput not capturing the generated columns
On Mon, 20 May 2024 at 13:49, Masahiko Sawada wrote: > > Hi, > > On Wed, May 8, 2024 at 4:14 PM Shubham Khanna > wrote: > > > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal > > wrote: > > > > > > Hi PG Hackers. > > > > > > We are interested in enhancing the functionality of the pgoutput plugin > > > by adding support for generated columns. > > > Could you please guide us on the necessary steps to achieve this? > > > Additionally, do you have a platform for tracking such feature requests? > > > Any insights or assistance you can provide on this matter would be > > > greatly appreciated. > > > > The attached patch has the changes to support capturing generated > > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > > ‘include_generated_columns’ option is specified, the generated column > > information and generated column data also will be sent. > > As Euler mentioned earlier, I think it's a decision not to replicate > generated columns because we don't know the target table on the > subscriber has the same expression and there could be locale issues > even if it looks the same. I can see that a benefit of this proposal > would be to save cost to compute generated column values if the user > wants the target table on the subscriber to have exactly the same data > as the publisher's one. Are there other benefits or use cases? I think this will be useful mainly for the use cases where the publisher has generated columns and the subscriber does not have generated columns. In the case where both the publisher and subscriber have generated columns, the current patch will overwrite the generated column values based on the expression for the generated column in the subscriber. > > > > Usage from pgoutput plugin: > > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > > (a * 2) STORED); > > CREATE publication pub1 for all tables; > > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); > > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > > 'proto_version', '1', 'publication_names', 'pub1', > > 'include_generated_columns', 'true'); > > > > Usage from test_decoding plugin: > > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', > > 'test_decoding'); > > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > > (a * 2) STORED); > > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > > 'include-xids', '0', 'skip-empty-xacts', '1', > > 'include_generated_columns', '1'); > > > > Currently it is not supported as a subscription option because table > > sync for the generated column is not possible as copy command does not > > support getting data for the generated column. If this feature is > > required we can remove this limitation from the copy command and then > > add it as a subscription option later. > > Thoughts? > > I think that if we want to support an option to replicate generated > columns, the initial tablesync should support it too. Otherwise, we > end up filling the target columns data with NULL during the initial > tablesync but with replicated data during the streaming changes. +1 for supporting initial sync. Currently copy_data = true and generate_column = true are not supported, this limitation will be removed in one of the upcoming patches. Regards, Vignesh
Re: Pgoutput not capturing the generated columns
Hi, On Wed, May 8, 2024 at 4:14 PM Shubham Khanna wrote: > > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal > wrote: > > > > Hi PG Hackers. > > > > We are interested in enhancing the functionality of the pgoutput plugin by > > adding support for generated columns. > > Could you please guide us on the necessary steps to achieve this? > > Additionally, do you have a platform for tracking such feature requests? > > Any insights or assistance you can provide on this matter would be greatly > > appreciated. > > The attached patch has the changes to support capturing generated > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the > ‘include_generated_columns’ option is specified, the generated column > information and generated column data also will be sent. As Euler mentioned earlier, I think it's a decision not to replicate generated columns because we don't know the target table on the subscriber has the same expression and there could be locale issues even if it looks the same. I can see that a benefit of this proposal would be to save cost to compute generated column values if the user wants the target table on the subscriber to have exactly the same data as the publisher's one. Are there other benefits or use cases? > > Usage from pgoutput plugin: > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > (a * 2) STORED); > CREATE publication pub1 for all tables; > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > 'proto_version', '1', 'publication_names', 'pub1', > 'include_generated_columns', 'true'); > > Usage from test_decoding plugin: > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', > 'test_decoding'); > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS > (a * 2) STORED); > INSERT INTO gencoltable (a) VALUES (1), (2), (3); > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > > Currently it is not supported as a subscription option because table > sync for the generated column is not possible as copy command does not > support getting data for the generated column. If this feature is > required we can remove this limitation from the copy command and then > add it as a subscription option later. > Thoughts? I think that if we want to support an option to replicate generated columns, the initial tablesync should support it too. Otherwise, we end up filling the target columns data with NULL during the initial tablesync but with replicated data during the streaming changes. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Pgoutput not capturing the generated columns
Hi Kuroda-san, Thanks for reviewing the patch. I have fixed some of the comments > 2. > Currently, the option is implemented as streaming option. Are there any > reasons > to choose the way? Another approach is to implement as slot option, like > failover > and temporary. I think the current approach is appropriate. The options such as failover and temporary seem like properties of a slot and I think decoding of generated column should not be slot specific. Also adding a new option for slot may create an overhead. > 3. > You said that subscription option is not supported for now. Not sure, is it > mean > that logical replication feature cannot be used for generated columns? If so, > the restriction won't be acceptable. If the combination between this and > initial > sync is problematic, can't we exclude them in CreateSubscrition and > AlterSubscription? > E.g., create_slot option cannot be set if slot_name is NONE. Added an option 'generated_column' for create subscription. Currently it allow to set 'generated_column' option as true only if 'copy_data' is set to false. Also we don't allow user to alter the 'generated_column' option. > 6. logicalrep_write_tuple() > > ``` > -if (!column_in_column_list(att->attnum, columns)) > +if (!column_in_column_list(att->attnum, columns) && > !att->attgenerated) > +continue; > ``` > > Hmm, does above mean that generated columns are decoded even if they are not > in > the column list? If so, why? I think such columns should not be sent. Fixed Thanks and Regards, Shlok Kyal v2-0001-Support-generated-column-capturing-generated-colu.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
Here are some review comments for the patch v1-0001. == GENERAL G.1. Use consistent names It seems to add unnecessary complications by having different names for all the new options, fields and API parameters. e.g. sometimes 'include_generated_columns' e.g. sometimes 'publish_generated_columns' Won't it be better to just use identical names everywhere for everything? I don't mind which one you choose; I just felt you only need one name, not two. This comment overrides everything else in this post so whatever name you choose, make adjustments for all my other review comments as necessary. == G.2. Is it possible to just use the existing bms? A very large part of this patch is adding more API parameters to delegate the 'publish_generated_columns' flag value down to when it is finally checked and used. e.g. The functions: - logicalrep_write_insert(), logicalrep_write_update(), logicalrep_write_delete() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_tuple The functions: - logicalrep_write_rel() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_attrs AFAICT in all these places the API is already passing a "Bitmapset *columns". I was wondering if it might be possible to modify the "Bitmapset *columns" BEFORE any of those functions get called so that the "columns" BMS either does or doesn't include generated cols (as appropriate according to the option). Well, it might not be so simple because there are some NULL BMS considerations also, but I think it would be worth investigating at least, because if indeed you can find some common place (somewhere like pgoutput_change()?) where the columns BMS can be filtered to remove bits for generated cols then it could mean none of those other patch API changes are needed at all -- then the patch would only be 1/2 the size. == Commit message 1. Now if include_generated_columns option is specified, the generated column information and generated column data also will be sent. Usage from pgoutput plugin: SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub1', 'include_generated_columns', 'true'); Usage from test_decoding plugin: SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); ~ I think there needs to be more background information given here. This commit message doesn't seem to describe anything about what is the problem and how this patch fixes it. It just jumps straight into giving usages of a 'include_generated_columns' option. It also doesn't say that this is an option that was newly *introduced* by the patch -- it refers to it as though the reader should already know about it. Furthermore, your hacker's post says "Currently it is not supported as a subscription option because table sync for the generated column is not possible as copy command does not support getting data for the generated column. If this feature is required we can remove this limitation from the copy command and then add it as a subscription option later." IMO that all seems like the kind of information that ought to also be mentioned in this commit message. == contrib/test_decoding/sql/ddl.sql 2. +-- check include_generated_columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); +DROP TABLE gencoltable; + 2a. Perhaps you should include both option values to demonstrate the difference in behaviour: 'include_generated_columns', '0' 'include_generated_columns', '1' ~ 2b. I think you maybe need to include more some test combinations where there is and isn't a COLUMN LIST, because I am not 100% sure I understand the current logic/expectations for all combinations. e.g. When the generated column is in a column list but 'publish_generated_columns' is false then what should happen? etc. Also if there are any special rules then those should be mentioned in the commit message. == src/backend/replication/logical/proto.c 3. For all the API changes the new parameter name should be plural. /publish_generated_column/publish_generated_columns/ ~~~ 4. logical_rep_write_tuple: - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; That code seems confusing. Shouldn't the logic be exactly as also in logicalrep_write_attrs()? e.g. Shouldn't they both look like this: if (att->attisdropped) continue; if
RE: Pgoutput not capturing the generated columns
Dear Shubham, Thanks for creating a patch! Here are high-level comments. 1. Please document the feature. If it is hard to describe, we should change the API. 2. Currently, the option is implemented as streaming option. Are there any reasons to choose the way? Another approach is to implement as slot option, like failover and temporary. 3. You said that subscription option is not supported for now. Not sure, is it mean that logical replication feature cannot be used for generated columns? If so, the restriction won't be acceptable. If the combination between this and initial sync is problematic, can't we exclude them in CreateSubscrition and AlterSubscription? E.g., create_slot option cannot be set if slot_name is NONE. 4. Regarding the test_decoding plugin, it has already been able to decode the generated columns. So... as the first place, is the proposed option really needed for the plugin? Why do you include it? If you anyway want to add the option, the default value should be on - which keeps current behavior. 5. Assuming that the feature become usable used for logical replicaiton. Not sure, should we change the protocol version at that time? Nodes prior than PG17 may not want receive values for generated columns. Can we control only by the option? 6. logicalrep_write_tuple() ``` -if (!column_in_column_list(att->attnum, columns)) +if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) +continue; ``` Hmm, does above mean that generated columns are decoded even if they are not in the column list? If so, why? I think such columns should not be sent. 7. Some functions refer data->publish_generated_column many times. Can we store the value to a variable? Below comments are for test_decoding part, but they may be not needed. = a. pg_decode_startup() ``` +else if (strcmp(elem->defname, "include_generated_columns") == 0) ``` Other options for test_decoding do not have underscore. It should be "include-generated-columns". b. pg_decode_change() data->include_generated_columns is referred four times in the function. Can you store the value to a varibable? c. pg_decode_change() ``` -true); +true, data->include_generated_columns ); ``` Please remove the blank. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Pgoutput not capturing the generated columns
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal wrote: > > Hi PG Hackers. > > We are interested in enhancing the functionality of the pgoutput plugin by > adding support for generated columns. > Could you please guide us on the necessary steps to achieve this? > Additionally, do you have a platform for tracking such feature requests? Any > insights or assistance you can provide on this matter would be greatly > appreciated. The attached patch has the changes to support capturing generated column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the ‘include_generated_columns’ option is specified, the generated column information and generated column data also will be sent. Usage from pgoutput plugin: CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); CREATE publication pub1 for all tables; SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput'); SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub1', 'include_generated_columns', 'true'); Usage from test_decoding plugin: SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding'); CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); INSERT INTO gencoltable (a) VALUES (1), (2), (3); SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); Currently it is not supported as a subscription option because table sync for the generated column is not possible as copy command does not support getting data for the generated column. If this feature is required we can remove this limitation from the copy command and then add it as a subscription option later. Thoughts? Thanks and Regards, Shubham Khanna. v1-0001-Support-capturing-generated-column-data-using-pgo.patch Description: Binary data
Re: Pgoutput not capturing the generated columns
Hi PG Hackers. We are interested in enhancing the functionality of the pgoutput plugin by adding support for generated columns. Could you please guide us on the necessary steps to achieve this? Additionally, do you have a platform for tracking such feature requests? Any insights or assistance you can provide on this matter would be greatly appreciated. Many thanks. Rajendra.
Re: Pgoutput not capturing the generated columns
Thanks Euler, Greatly appreciate your inputs. > Should pgoutput provide a complete row? Probably. If it is an option that > defaults to false and doesn't impact performance. Yes, it would be great if this feature can be implemented. > The logical replication design decides to compute the generated columns at > subscriber side. If I understand correctly, this approach involves establishing a function on the subscriber's side that emulates the operation executed to derive the generated column values. If yes, I see one potential issue where disparities might surface between the values of generated columns on the subscriber's side and those computed within Postgres. This could happen if the generated column's value relies on the current_time function. Please let me know how can we track the feature requests and the discussions around that. Thanks, Rajendra.
Re: Pgoutput not capturing the generated columns
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote: > With decoderbufs and wal2json the connector is able to capture the generated > column `full_name` in above example. But with pgoutput the generated column > was not captured. wal2json materializes the generated columns before delivering the output. I decided to materialized the generated columns in the output plugin because the target consumers expects a complete row. > Is this a known limitation of pgoutput plugin? If yes, where can we request > to add support for this feature? I wouldn't say limitation but a design decision. The logical replication design decides to compute the generated columns at subscriber side. It was a wise decision aiming optimization (it doesn't overload the publisher that is *already* in charge of logical decoding). Should pgoutput provide a complete row? Probably. If it is an option that defaults to false and doesn't impact performance. The request for features should be done in this mailing list. -- Euler Taveira EDB https://www.enterprisedb.com/