Re: Pgoutput not capturing the generated columns

2024-07-05 Thread Shubham Khanna
On Tue, Jul 2, 2024 at 10:59 AM Peter Smith  wrote:
>
> Hi Shubham,
>
> As you can see, most of my recent review comments for patch 0001 are
> only cosmetic nitpicks. But, there is still one long-unanswered design
> question from a month ago [1, #G.2]
>
> A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
> is related to the introduction and passing everywhere of new
> 'include_generated_columns' function parameters. These same functions
> are also always passing "BitMapSet *columns" representing the
> publication column list.
>
> My question was about whether we can't make use of the existing BMS
> parameter instead of introducing all the new API parameters.
>
> The idea might go something like this:
>
> * If 'include_generated_columns' option is specified true and if no
> column list was already specified then perhaps the relentry->columns
> can be used for a "dummy" column list that has everything including
> all the generated columns.
>
> * By doing this:
>  -- you may be able to avoid passing the extra
> 'include_gernated_columns' everywhere
>  -- you may be able to avoid checking for generated columns deeper in
> the code (since it is already checked up-front when building the
> column list BMS)
>
> ~~
>
> I'm not saying this design idea is guaranteed to work, but it might be
> worth considering, because if it does work then there is potential to
> make the current 0001 patch significantly shorter.
>
> ==
> [1] 
> https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

I have fixed this issue in the latest Patches.

Please refer to the updated v15 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2B%3Dhn--ALJQvzzu7meX3LuO3tJKppDS7eO1BGvNFYBAbg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-07-05 Thread Peter Smith
Here are my review comments for v14-0002.

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

nitpick - remove excessive parentheses in palloc0 call.

nitpick - Code is fine AFAICT except it's not immediately obvious
localgenlist is indexed by the *remote* attribute number. So I renamed
'attrnum' variable in my nitpicks diff. OTOH, if you think no change
is necessary, that is OK to (in that case maybe add a comment).

~~~

1. fetch_remote_table_info

+ if ((server_version >= 12 && server_version <= 16) ||
+ !MySubscription->includegencols)
+ appendStringInfo(, " AND a.attgenerated = ''");

Should this say < 18 instead of <= 16?

~~~

copy_table:

nitpick - uppercase in comment

nitpick - missing space after "if"

~~~

2. copy_table

+ attnamelist = make_copy_attnamelist(relmapentry, remotegenlist);
+
  /* Start copy on the publisher. */
  initStringInfo();

- /* Regular table with no row filter */
- if (lrel.relkind == RELKIND_RELATION && qual == NIL)
+ /* check if remote column list has generated columns */
+ if(MySubscription->includegencols)
+ {
+ for (int i = 0; i < relmapentry->remoterel.natts; i++)
+ {
+ if(remotegenlist[i])
+ {
+ remote_has_gencol = true;
+ break;
+ }
+ }
+ }
+

There is some subtle logic going on here:

For example, the comment here says "Check if the remote column list
has generated columns", and it then proceeds to iterate the remote
attributes checking the remotegenlist[i]. But the remotegenlist[] was
returned from a prior call to make_copy_attnamelist() and according to
the make_copy_attnamelist logic, it is NOT returning all remote
generated-cols in that list. Specifically, it is stripping some of
them -- "Do not include generated columns of the subscription table in
the [remotegenlist] column list.".

So, actually this loop seems to be only finding cases (setting
remote_has_gen = true) where the remote column is generated but the
match local column is *not* generated. Maybe this was the intended
logic all along but then certainly the comment should be improved to
describe it better.

~~~

3.
+ /*
+ * Regular table with no row filter and 'include_generated_columns'
+ * specified as 'false' during creation of subscription.
+ */
+ if (lrel.relkind == RELKIND_RELATION && qual == NIL && !remote_has_gencol)

nitpick - This comment also needs improving. For example, just because
remote_has_gencol is false, it does not follow that
'include_generated_columns' was specified as 'false' -- maybe the
parameter was 'true' but the table just had no generated columns
anyway... I've modified the comment already in my nitpicks diff, but
probably you can improve on that.

~

nitpick - "else" comment is modified slightly too. Please see the nitpicks diff.

~

4.
In hindsight, I felt your variable 'remote_has_gencol' was not
well-named because it is not for saying the remote table has a
generated column -- it is saying the remote table has a generated
column **that we have to copy**. So, rather it should be named
something like 'gencol_copy_needed' (but I didn't change this name in
the nitpick diffs...)

==
src/test/subscription/t/004_sync.pl

nitpick - changes to comment style to make the test case separations
much more obvious
nitpick - minor comment wording tweaks

5.
Here, you are confirming we get an ERROR when replicating from a
non-generated column to a generated column. But I think your patch
also added exactly that same test scenario in the 011_generated (as
the sub5 test). So, maybe this one here should be removed?

==
src/test/subscription/t/011_generated.pl

nitpick - comment wrapping at 80 chars
nitpick - add/remove blank lines for readability
nitpick - typo /subsriber/subscriber/
nitpick - prior to the ALTER test, tab6 is unsubscribed. So add
another test to verify its initial data
nitpick - sometimes the msg 'add a new table to existing publication'
is misplaced
nitpick - the tests for tab6 and tab5 were in opposite to the expected
order, so swapped them.

==
99.
Please see also the attached diff which implements all the nitpicks
described in this post.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 38f3621..c264353 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -704,30 +704,30 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool 
*remotegenlist)
TupleDesc   desc;
 
desc = RelationGetDescr(rel->localrel);
-   localgenlist = palloc0((rel->remoterel.natts * sizeof(bool)));
+   localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
 
/*
 * This loop checks for generated columns on subscription table.
 */
for (int i = 0; i < desc->natts; i++)
{
-   int attnum;
+   int remote_attnum;
Form_pg_attribute 

Re: Pgoutput not capturing the generated columns

2024-07-04 Thread Shlok Kyal
On Wed, 26 Jun 2024 at 08:06, Peter Smith  wrote:
>
> Hi Shlok. Here are my review comments for patch v10-0003
>
> ==
> General.
>
> 1.
> The patch has lots of conditions like:
> if (att->attgenerated && (att->attgenerated !=
> ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
>  continue;
>
> IMO these are hard to read. Although more verbose, please consider if
> all those (for the sake of readability) would be better re-written
> like below :
>
> if (att->generated)
> {
>   if (!include_generated_columns)
> continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
> continue;
> }
Fixed

> ==
> contrib/test_decoding/test_decoding.c
>
> tuple_to_stringinfo:
>
> NITPICK = refactored the code and comments a bit here to make it easier
> NITPICK - there is no need to mention "virtual". Instead, say we only
> support STORED
Fixed

> ==
> src/backend/catalog/pg_publication.c
>
> publication_translate_columns:
>
> NITPICK - introduced variable 'att' to simplify this code
Fixed

> ~
>
> 2.
> + ereport(ERROR,
> + errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> + errmsg("cannot use virtual generated column \"%s\" in publication
> column list",
> +colname));
>
> Is it better to avoid referring to "virtual" at all? Instead, consider
> rearranging the wording to say something like "generated column \"%s\"
> is not STORED so cannot be used in a publication column list"
Fixed

> ~~~
>
> pg_get_publication_tables:
>
> NITPICK - split the condition code for readability
Fixed

> ==
> src/backend/replication/logical/relation.c
>
> 3. logicalrep_rel_open
>
> + if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
> + continue;
> +
>
> Isn't this missing some code to say "entry->attrmap->attnums[i] =
> -1;", same as all the other nearby code is doing?
Fixed

> ~~~
>
> 4.
> I felt all the "generated column" logic should be kept together, so
> this new condition should be combined with the other generated column
> condition, like:
>
> if (attr->attgenerated)
> {
>   /* comment... */
>   if (!MySubscription->includegencols)
>   {
> entry->attrmap->attnums[i] = -1;
> continue;
>   }
>
>   /* comment... */
>   if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   {
> entry->attrmap->attnums[i] = -1;
> continue;
>   }
> }
Fixed

> ==
> src/backend/replication/logical/tablesync.c
>
> 5.
> + if (gencols_allowed)
> + {
> + /* Replication of generated cols is supported, but not VIRTUAL cols. */
> + appendStringInfo(, " AND a.attgenerated != 'v'");
> + }
>
> Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
> of the hardwired 'v'? (Maybe add another TODO comment to revisit
> this).
>
> Alternatively, consider if it is more future-proof to rearrange so it
> just says what *is* supported instead of what *isn't* supported:
> e.g. "AND a.attgenerated IN ('', 's')"
I feel we should use ATTRIBUTE_GENERATED_VIRTUAL macro. Added a TODO.

> ==
> src/test/subscription/t/011_generated.pl
>
> NITPICK - some comments are missing the word "stored"
> NITPICK - sometimes "col" should be plural "cols"
> NITPICK = typo "GNERATED"
Add the relevant changes.

> ==
>
> 6.
> In a previous review [1, comment #3] I mentioned that there should be
> some docs updates on the "Logical Replication Message Formats" section
> 53.9. So, I expected patch 0001 would make some changes and then patch
> 0003 would have to update it again to say something about "STORED".
> But all that is missing from the v10* patches.
>
> ==
Will fix in upcoming version

>
> 99.
> See also my nitpicks diff which is a top-up patch addressing all the
> nitpick comments mentioned above. Please apply all of these that you
> agree with.
Applied Relevant changes

Please refer v14 patch for the changes [1].


[1]: 
https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-07-04 Thread Shlok Kyal
On Tue, 25 Jun 2024 at 18:49, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shlok,
>
> Thanks for updating patches! Below are my comments, maybe only for 0002.
>
> 01. General
>
> IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET 
> include_generated_columns
> is prohibit. Previously, it seems okay because there are exclusive options. 
> But now,
> such restrictions are gone. Do you have a reason in your mind? It is just not 
> considered
> yet?
We donot support ALTER SUBSCRIPTION to alter
'include_generated_columns'. Suppose initially the user has a logical
replication setup. Publisher has
table t1 with columns (c1 int, c2 int generated always as (c1*2)) and
subscriber has table t1 with columns (c1 int, c2 int). And initially
'incude_generated_column' is true.
Now if we 'ALTER SUBSCRIPTION' to set 'include_generated_columns' as
false. Initial rows will have data for c2 on the subscriber table, but
will not have value after alter. This may be an inconsistent
behaviour.


> 02. General
>
> According to the doc, we allow to alter a column to non-generated one, by 
> ALTER
> TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
> when the command is executed on the subscriber while copying the data? Should
> we continue the copy or restart? How do you think?
COPY of data will happen in a single transaction, so if we execute
'ALTER TABLE ... ALTER COLUMN ... DROP EXPRESSION' command, It will
take place after the whole COPY command will finish. So I think it
will not create any issue.

> 03. Tes tcode
>
> IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
> a test for that?
Added

> 04. Test code (maybe for 0001)
>
> Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION 
> command.
Added

> 05. logicalrep_rel_open
>
> ```
> +/*
> + * In case 'include_generated_columns' is 'false', we should 
> skip the
> + * check of missing attrs for generated columns.
> + * In case 'include_generated_columns' is 'true', we should 
> check if
> + * corresponding column for the generated column in publication 
> column
> + * list is present in the subscription table.
> + */
> +if (!MySubscription->includegencols && attr->attgenerated)
> +{
> +entry->attrmap->attnums[i] = -1;
> +continue;
> +}
> ```
>
> This comment is not very clear to me, because here we do not skip anything.
> Can you clarify the reason why attnums[i] is set to -1 and how will it be 
> used?
This part of the code is removed to address some comments.

> 06. make_copy_attnamelist
>
> ```
> +gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
> ```
>
> I think this array is too large. Can we reduce a size to (desc->natts * 
> sizeof(bool))?
> Also, the free'ing should be done.
I have changed the name 'gencollist' to 'localgenlist' to make the
name more consistent. Also
size should be (rel->remoterel.natts * sizeof(bool)) as I am storing
if a column is generated like 'localgenlist[attnum] = true;'
where 'attnum' is corresponding attribute number on publisher side.

> 07. make_copy_attnamelist
>
> ```
> +/* Loop to handle subscription table generated columns. */
> +for (int i = 0; i < desc->natts; i++)
> ```
>
> IIUC, the loop is needed to find generated columns on the subscriber side, 
> right?
> Can you clarify as comment?
Fixed

> 08. copy_table
>
> ```
> +/*
> + * Regular table with no row filter and 'include_generated_columns'
> + * specified as 'false' during creation of subscription.
> + */
> ```
>
> I think this comment is not correct. After patching, all tablesync command 
> becomes
> like COPY (SELECT ...) if include_genereted_columns is set to true. Is it 
> right?
> Can we restrict only when the table has generated ones?
Fixed

Please refer to v14 patch for the changes [1].


[1]: 
https://www.postgresql.org/message-id/CANhcyEW95M_usF1OJDudeejs0L0%2BYOEb%3DdXmt_4Hs-70%3DCXa-g%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-07-01 Thread Peter Smith
Hi Shubham,

As you can see, most of my recent review comments for patch 0001 are
only cosmetic nitpicks. But, there is still one long-unanswered design
question from a month ago [1, #G.2]

A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
is related to the introduction and passing everywhere of new
'include_generated_columns' function parameters. These same functions
are also always passing "BitMapSet *columns" representing the
publication column list.

My question was about whether we can't make use of the existing BMS
parameter instead of introducing all the new API parameters.

The idea might go something like this:

* If 'include_generated_columns' option is specified true and if no
column list was already specified then perhaps the relentry->columns
can be used for a "dummy" column list that has everything including
all the generated columns.

* By doing this:
 -- you may be able to avoid passing the extra
'include_gernated_columns' everywhere
 -- you may be able to avoid checking for generated columns deeper in
the code (since it is already checked up-front when building the
column list BMS)

~~

I'm not saying this design idea is guaranteed to work, but it might be
worth considering, because if it does work then there is potential to
make the current 0001 patch significantly shorter.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-07-01 Thread Peter Smith
On Mon, Jul 1, 2024 at 8:38 PM Shubham Khanna
 wrote:
>
>...
> > 8.
> > + else if (strcmp(elem->defname, "include-generated-columns") == 0)
> > + {
> > + if (elem->arg == NULL)
> > + data->include_generated_columns = true;
> >
> > Is there any way to test that "elem->arg == NULL" in the
> > generated.sql? OTOH, if it is not possible to get here then is the
> > code even needed?
> >
>
> Currently I could not find a case where the
> 'include_generated_columns' option is not specifying any value, but  I
> was hesitant to remove this from here as the other options mentioned
> follow the same rules. Thoughts?
>

If you do manage to find a scenario for this then I think a test for
it would be good. But, I agree that the code seems OK because now I
see it is the same pattern as similar nearby code.

~~~

Thanks for the updated patch. Here are some review comments for patch v13-0001.

==
.../expected/generated_columns.out

nitpicks (see generated_columns.sql)

==
.../test_decoding/sql/generated_columns.sql

nitpick - use plural /column/columns/
nitpick - use consistent wording in the comments
nitpick - IMO it is better to INSERT different values for each of the tests

==
doc/src/sgml/protocol.sgml

nitpick - I noticed that none of the other boolean parameters on this
page mention about a default, so maybe here we should do the same and
omit that information.

~~~

1.
- 
-  Next, the following message part appears for each column included in
-  the publication (except generated columns):
- 
-

In a previous review [1 comment #11] I wrote that you can't just
remove this paragraph because AFAIK it is still meaningful. A minimal
change might be to just remove the "(except generated columns)" part.
Alternatively, you could give a more detailed explanation mentioning
the include_generated_columns protocol parameter.

I provided some updated text for this paragraph in my NITPICKS top-up
patch, Please have a look at that for ideas.

==
src/backend/commands/subscriptioncmds.c

It looks like pg_indent needs to be run on this file.

==
src/include/catalog/pg_subscription.h

nitpick - comment /publish/Publish/ for consistency

==
src/include/replication/walreceiver.h

nitpick - comment /publish/Publish/ for consistency

==
src/test/regress/expected/subscription.out

nitpicks - (see subscription.sql)

==
src/test/regress/sql/subscription.sql

nitpick - combine the invalid option combinations test with all the
others (no special comment needed)
nitpick - rename subscription as 'regress_testsub2' same as all its peers.

==
src/test/subscription/t/011_generated.pl

nitpick - add/remove blank lines

==
src/test/subscription/t/031_column_list.pl

nitpick - rewording for a comment. This issue was not strictly caused
by this patch, but since you are modifying the same comment we can fix
this in passing.

==
99.
Please also see the attached top-up patch for all those nitpicks
identified above.

==
[1] v11-0001 review
https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/contrib/test_decoding/expected/generated_columns.out 
b/contrib/test_decoding/expected/generated_columns.out
index 4c3d6dd..f3b26aa 100644
--- a/contrib/test_decoding/expected/generated_columns.out
+++ b/contrib/test_decoding/expected/generated_columns.out
@@ -1,4 +1,4 @@
--- test decoding of generated column
+-- test decoding of generated columns
 SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 
'test_decoding');
  ?column? 
 --
@@ -7,7 +7,7 @@ SELECT 'init' FROM 
pg_create_logical_replication_slot('regression_slot', 'test_d
 
 -- column b' is a generated column
 CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
--- when 'include-generated-columns' is not set the generated column 'b' will 
be replicated
+-- when 'include-generated-columns' is not set the generated column 'b' values 
will be replicated
 INSERT INTO gencoltable (a) VALUES (1), (2), (3);
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1');
 data 
@@ -20,26 +20,26 @@ SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
 (5 rows)
 
 -- when 'include-generated-columns' = '1' the generated column 'b' values will 
be replicated
-INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+INSERT INTO gencoltable (a) VALUES (4), (5), (6);
 SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1');
-data 
--
+ data 

Re: Pgoutput not capturing the generated columns

2024-06-27 Thread Peter Smith
Hi, here are some patch v11-0001 comments.

(BTW, I had difficulty reviewing this because something seemed strange
with the changes this patch made to the test_decoding tests).

==
General

1. Patch name

Patch name does not need to quote 'logical replication'

~

2. test_decoding tests

Multiple test_decoding tests were failing for me. There is something
very suspicious about the unexplained changes the patch made to the
expected "binary.out" and "decoding_into_rel.out" etc. I REVERTED all
those changes in my nitpicks top-up to get everything working. Please
re-confirm that all the test_decoding tests are OK!

==
Commit Message

3.
Since you are including the example usage for test_decoding, I think
it's better to include the example usage of CREATE SUBSCRIPTION also.

==
contrib/test_decoding/expected/binary.out

4.
 SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
- ?column?
---
- init
-(1 row)
-
+ERROR:  replication slot "regression_slot" already exists

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

==
.../expected/decoding_into_rel.out

5.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
- ?column?
---
- stop
-(1 row)
-

Huh? Why is this unrelated expected output changed by this patch?

The test_decoding test fails for me unless I REVERT this change!! See
my nitpicks diff.

==
.../test_decoding/sql/decoding_into_rel.sql

6.
-SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');
+SELECT 'stop' FROM pg_drop_replication_slot('regression_slot');

Huh, Why does this patch change this code at all? I REVERTED this
change. See my nitpicks diff.

==
.../test_decoding/sql/generated_columns.sql

(see my nitpicks replacement file for this test)

7.
+-- test that we can insert the result of a 'include_generated_columns'
+-- into the tables created. That's really not a good idea in practical terms,
+-- but provides a nice test.

NITPICK - I didn't understand the point of this comment.  I updated
the comment according to my understanding.

~

NITPICK - The comment "when 'include-generated-columns' is not set
then columns will not be replicated" is the opposite of what the
result is. I changed this comment.

NITPICK - modified and unified wording of some of the other comments

NITPICK - changed some blank lines

==
contrib/test_decoding/test_decoding.c

8.
+ else if (strcmp(elem->defname, "include-generated-columns") == 0)
+ {
+ if (elem->arg == NULL)
+ data->include_generated_columns = true;

Is there any way to test that "elem->arg == NULL" in the
generated.sql? OTOH, if it is not possible to get here then is the
code even needed?

==
doc/src/sgml/ddl.sgml

9.
  
-  Generated columns are skipped for logical replication and cannot be
-  specified in a CREATE PUBLICATION column list.
+  'include_generated_columns' option controls whether generated columns
+  should be included in the string representation of tuples during
+  logical decoding in PostgreSQL. The default is true.
  

NITPICK - Use proper markdown instead of single quotes for the parameter.

NITPICK - I think this can be reworded slightly to provide a
cross-reference to the CREATE SUBSCRIPTION parameter for more details
(which means then we can avoid repeating details like the default
value here). PSA my nitpicks diff for an example of how I thought docs
should look.

==
doc/src/sgml/protocol.sgml

10.
+The default is true.

No, it isn't. AFAIK you made the default behaviour true only for
'test_decoding', but the default for CREATE SUBSCRIPTION remains
*false* because that is the existing PG17 behaviour. And the default
for the 'include_generated_columns' in the protocol is *also* false to
match the CREATE SUBSCRIPTION default.

e.g. libpqwalreceiver.c only sets ", include_generated_columns 'true'"
when options->proto.logical.include_generated_columns
e.g. worker.c says: options->proto.logical.include_generated_columns =
MySubscription->includegencols;
e.g. subscriptioncmds.c sets default: opts->include_generated_columns = false;

(This confirmed my previous review expectation that using different
default behaviours for test_decoding and pgoutput would surely lead to
confusion)

~~~

11.
- 
-  Next, the following message part appears for each column included in
-  the publication (except generated columns):
- 
-

AFAIK you cannot just remove this entire paragraph because I thought
it was still relevant to talking about "... the following message
part". But, if you don't want to explain and cross-reference about
'include_generated_columns' then maybe it is OK just to remove the
"(except generated columns)" part?

==
src/test/subscription/t/011_generated.pl

NITPICK - comment typo /tab2/tab3/
NITPICK - remove some blank lines

~~~

12.
# the column was NOT replicated (the 

Re: Pgoutput not capturing the generated columns

2024-06-26 Thread Shubham Khanna
On Mon, Jun 24, 2024 at 8:21 AM Peter Smith  wrote:
>
> Hi, here are some patch v9-0001 comments.
>
> I saw Kuroda-san has already posted comments for this patch so there
> may be some duplication here.
>
> ==
> GENERAL
>
> 1.
> The later patches 0002 etc are checking to support only STORED
> gencols. But, doesn't that restriction belong in this patch 0001 so
> VIRTUAL columns are not decoded etc in the first place... (??)
>
> ~~~
>
> 2.
> The "Generated Columns" docs mentioned in my previous review comment
> [2] should be modified by this 0001 patch.
>
> ~~~
>
> 3.
> I think the "Message Format" page mentioned in my previous review
> comment [3] should be modified by this 0001 patch.
>
> ==
> Commit message
>
> 4.
> The patch name is still broken as previously mentioned [1, #1]
>
> ==
> doc/src/sgml/protocol.sgml
>
> 5.
> Should this docs be referring to STORED generated columns, instead of
> just generated columns?
>
> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> 6.
> Should this be docs referring to STORED generated columns, instead of
> just generated columns?
>
> ==
> src/bin/pg_dump/pg_dump.c
>
> getSubscriptions:
> NITPICK - tabs
> NITPICK - patch removed a blank line it should not be touching
> NITPICK = patch altered indents it should not be touching
> NITPICK - a missing blank line that was previously present
>
> 7.
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols,\n");
>
> There is an unwanted comma here.
>
> ~
>
> dumpSubscription
> NITPICK - patch altered indents it should not be touching
>
> ==
> src/bin/pg_dump/pg_dump.h
>
> NITPICK - unnecessary blank line
>
> ==
> src/bin/psql/describe.c
>
> describeSubscriptions
> NITPICK - bad indentation
>
> 8.
> In my previous review [1, #4b] I suggested this new column should be
> in a different order (e.g. adjacent to the other ones ahead of
> 'Conninfo'), but this is not yet addressed.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> NITPICK - missing space in comment
> NITPICK - misleading "because" wording in the comment
>
> ==
>
> 99.
> See also my attached nitpicks diff, for cosmetic issues. Please apply
> whatever you agree with.
>
> ==
> [1] My v8-0001 review -
> https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
> [2] 
> https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
> [3] 
> https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

All the comments are handled.

I have attached the updated patch v11 here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJpS_XDkR6OrsmMZtCBZNxeYoCdENhC0%3Dbe0rLmNvhiQw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-06-25 Thread Peter Smith
Hi Shlok. Here are my review comments for patch v10-0003

==
General.

1.
The patch has lots of conditions like:
if (att->attgenerated && (att->attgenerated !=
ATTRIBUTE_GENERATED_STORED || !include_generated_columns))
 continue;

IMO these are hard to read. Although more verbose, please consider if
all those (for the sake of readability) would be better re-written
like below :

if (att->generated)
{
  if (!include_generated_columns)
continue;

  if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
continue;
}

==
contrib/test_decoding/test_decoding.c

tuple_to_stringinfo:

NITPICK = refactored the code and comments a bit here to make it easier
NITPICK - there is no need to mention "virtual". Instead, say we only
support STORED

==
src/backend/catalog/pg_publication.c

publication_translate_columns:

NITPICK - introduced variable 'att' to simplify this code

~

2.
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot use virtual generated column \"%s\" in publication
column list",
+colname));

Is it better to avoid referring to "virtual" at all? Instead, consider
rearranging the wording to say something like "generated column \"%s\"
is not STORED so cannot be used in a publication column list"

~~~

pg_get_publication_tables:

NITPICK - split the condition code for readability

==
src/backend/replication/logical/relation.c

3. logicalrep_rel_open

+ if (attr->attgenerated && attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
+ continue;
+

Isn't this missing some code to say "entry->attrmap->attnums[i] =
-1;", same as all the other nearby code is doing?

~~~

4.
I felt all the "generated column" logic should be kept together, so
this new condition should be combined with the other generated column
condition, like:

if (attr->attgenerated)
{
  /* comment... */
  if (!MySubscription->includegencols)
  {
entry->attrmap->attnums[i] = -1;
continue;
  }

  /* comment... */
  if (attr->attgenerated != ATTRIBUTE_GENERATED_STORED)
  {
entry->attrmap->attnums[i] = -1;
continue;
  }
}

==
src/backend/replication/logical/tablesync.c

5.
+ if (gencols_allowed)
+ {
+ /* Replication of generated cols is supported, but not VIRTUAL cols. */
+ appendStringInfo(, " AND a.attgenerated != 'v'");
+ }

Is it better here to use the ATTRIBUTE_GENERATED_VIRTUAL macro instead
of the hardwired 'v'? (Maybe add another TODO comment to revisit
this).

Alternatively, consider if it is more future-proof to rearrange so it
just says what *is* supported instead of what *isn't* supported:
e.g. "AND a.attgenerated IN ('', 's')"

==
src/test/subscription/t/011_generated.pl

NITPICK - some comments are missing the word "stored"
NITPICK - sometimes "col" should be plural "cols"
NITPICK = typo "GNERATED"

==

6.
In a previous review [1, comment #3] I mentioned that there should be
some docs updates on the "Logical Replication Message Formats" section
53.9. So, I expected patch 0001 would make some changes and then patch
0003 would have to update it again to say something about "STORED".
But all that is missing from the v10* patches.

==

99.
See also my nitpicks diff which is a top-up patch addressing all the
nitpick comments mentioned above. Please apply all of these that you
agree with.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvQ8CLq-JysTTeRj4u5SC9vTVcx3AgwTHcPUEOh-UnKcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Mon, Jun 24, 2024 at 10:56 PM Shlok Kyal  wrote:
>
> On Fri, 21 Jun 2024 at 09:03, Peter Smith  wrote:
> >
> > Hi, here are some review comments for patch v9-0002.
> >
> > ==
> > src/backend/replication/logical/relation.c
> >
> > 1. logicalrep_rel_open
> >
> > - if (attr->attisdropped)
> > + if (attr->attisdropped ||
> > + (!MySubscription->includegencols && attr->attgenerated))
> >
> > You replied to my question from the previous review [1, #2] as follows:
> > In case 'include_generated_columns' is 'true'. column list in
> > remoterel will have an entry for generated columns. So, in this case
> > if we skip every attr->attgenerated, we will get a missing column
> > error.
> >
> > ~
> >
> > TBH, the reason seems very subtle to me. Perhaps that
> > "(!MySubscription->includegencols && attr->attgenerated))" condition
> > should be coded as a separate "if", so then you can include a comment
> > similar to your answer, to explain it.
> Fixed
>
> > ==
> > src/backend/replication/logical/tablesync.c
> >
> > make_copy_attnamelist:
> >
> > NITPICK - punctuation in function comment
> > NITPICK - add/reword some more comments
> > NITPICK - rearrange comments to be closer to the code they are commenting
> Applied the changes
>
> > ~
> >
> > 2. make_copy_attnamelist.
> >
> > + /*
> > + * Construct column list for COPY.
> > + */
> > + for (int i = 0; i < rel->remoterel.natts; i++)
> > + {
> > + if(!gencollist[i])
> > + attnamelist = lappend(attnamelist,
> > +   makeString(rel->remoterel.attnames[i]));
> > + }
> >
> 

RE: Pgoutput not capturing the generated columns

2024-06-25 Thread Hayato Kuroda (Fujitsu)
Dear Shlok,

Thanks for updating patches! Below are my comments, maybe only for 0002.

01. General

IIUC, we are not discussed why ALTER SUBSCRIPTION ... SET 
include_generated_columns
is prohibit. Previously, it seems okay because there are exclusive options. But 
now,
such restrictions are gone. Do you have a reason in your mind? It is just not 
considered
yet?

02. General

According to the doc, we allow to alter a column to non-generated one, by ALTER
TABLE ... ALTER COLUMN ... DROP EXPRESSION command. Not sure, what should be
when the command is executed on the subscriber while copying the data? Should
we continue the copy or restart? How do you think?

03. Tes tcode

IIUC, REFRESH PUBLICATION can also lead the table synchronization. Can you add
a test for that?

04. Test code (maybe for 0001)

Please test the combination with TABLE ... ALTER COLUMN ... DROP EXPRESSION 
command.

05. logicalrep_rel_open

```
+/*
+ * In case 'include_generated_columns' is 'false', we should skip 
the
+ * check of missing attrs for generated columns.
+ * In case 'include_generated_columns' is 'true', we should check 
if
+ * corresponding column for the generated column in publication 
column
+ * list is present in the subscription table.
+ */
+if (!MySubscription->includegencols && attr->attgenerated)
+{
+entry->attrmap->attnums[i] = -1;
+continue;
+}
```

This comment is not very clear to me, because here we do not skip anything.
Can you clarify the reason why attnums[i] is set to -1 and how will it be used?

06. make_copy_attnamelist

```
+gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
```

I think this array is too large. Can we reduce a size to (desc->natts * 
sizeof(bool))?
Also, the free'ing should be done.

07. make_copy_attnamelist

```
+/* Loop to handle subscription table generated columns. */
+for (int i = 0; i < desc->natts; i++)
```

IIUC, the loop is needed to find generated columns on the subscriber side, 
right?
Can you clarify as comment?

08. copy_table

```
+/*
+ * Regular table with no row filter and 'include_generated_columns'
+ * specified as 'false' during creation of subscription.
+ */
```

I think this comment is not correct. After patching, all tablesync command 
becomes
like COPY (SELECT ...) if include_genereted_columns is set to true. Is it right?
Can we restrict only when the table has generated ones?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Pgoutput not capturing the generated columns

2024-06-25 Thread Peter Smith
Here are some review comments for the patch v10-0002.

==
Commit Message

1.
Note that we don't copy columns when the subscriber-side column is also
generated. Those will be filled as normal with the subscriber-side computed or
default data.

~

Now this patch also introduced some errors etc, so I think that patch
comment should be written differently to explicitly spell out
behaviour of every combination, something like the below:

Summary

when (include_generated_column = true)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: The
publisher generated column value is copied.

* publisher generated column => subscriber generated column: The
publisher generated column value is not copied. The subscriber
generated column will be filled with the subscriber-side computed or
default data.

when (include_generated_columns = false)

* publisher not-generated column => subscriber not-generated column:
This is just normal logical replication (not changed by this patch).

* publisher not-generated column => subscriber generated column: This
will give ERROR.

* publisher generated column => subscriber not-generated column: This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber column will be filled with the subscriber-side default
data.

* publisher generated column => subscriber generated column:  This
will replicate nothing. Publisher generate-column is not replicated.
The subscriber generated column will be filled with the
subscriber-side computed or default data.

==
src/backend/replication/logical/relation.c

2.
logicalrep_rel_open:

I tested some of the "missing column" logic, and got the following results:

Scenario A:
PUB
test_pub=# create table t2(a int, b int);
test_pub=# create publication pub2 for table t2;
SUB
test_sub=# create table t2(a int, b int generated always as (a*2) stored);
test_sub=# create subscription sub2 connection 'dbname=test_pub'
publication pub2 with (include_generated_columns = false);
Result:
ERROR:  logical replication target relation "public.t2" is missing
replicated column: "b"

~

Scenario B:
PUB/SUB identical to above, but subscription sub2 created "with
(include_generated_columns = true);"
Result:
ERROR:  logical replication target relation "public.t2" has a
generated column "b" but corresponding column on source relation is
not a generated column

~~~

2a. Question

Why should we get 2 different error messages for what is essentially
the same problem according to whether the 'include_generated_columns'
is false or true? Isn't the 2nd error message the more correct and
useful one for scenarios like this involving generated columns?

Thoughts?

~

2b. Missing tests?

I also noticed there seems no TAP test for the current "missing
replicated column" message. IMO there should be a new test introduced
for this because the loop involved too much bms logic to go
untested...

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:
NITPICK - minor comment tweak
NITPICK - add some spaces after "if" code

3.
Should you pfree the gencollist at the bottom of this function when
you no longer need it, for tidiness?

~~~

4.
 static void
-fetch_remote_table_info(char *nspname, char *relname,
+fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist,
  LogicalRepRelation *lrel, List **qual)
 {
  WalRcvExecResult *res;
  StringInfoData cmd;
  TupleTableSlot *slot;
  Oid tableRow[] = {OIDOID, CHAROID, CHAROID};
- Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID};
+ Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID};
  Oid qualRow[] = {TEXTOID};
  bool isnull;
+ bool*remotegenlist_res;

IMO the names 'remotegenlist' and 'remotegenlist_res' should be
swapped the other way around, because it is the function parameter
that is the "result", whereas the 'remotegenlist_res' is just the
local working var for it.

~~~

5. fetch_remote_table_info

Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple
places, I think it will be better to assign this to a 'server_version'
variable to be used everywhere instead of having multiple function
calls.

~~~

6.
  "SELECT a.attnum,"
  "   a.attname,"
  "   a.atttypid,"
- "   a.attnum = ANY(i.indkey)"
+ "   a.attnum = ANY(i.indkey),"
+ " a.attgenerated != ''"
  "  FROM pg_catalog.pg_attribute a"
  "  LEFT JOIN pg_catalog.pg_index i"
  "   ON (i.indexrelid = pg_get_replica_identity_index(%u))"
  " WHERE a.attnum > 0::pg_catalog.int2"
- "   AND NOT a.attisdropped %s"
+ "   AND NOT a.attisdropped", lrel->remoteid);
+
+ if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
+ walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) ||
+ !MySubscription->includegencols)
+ appendStringInfo(, " 

Re: Pgoutput not capturing the generated columns

2024-06-24 Thread Shlok Kyal
On Fri, 21 Jun 2024 at 12:51, Peter Smith  wrote:
>
> Hi, Here are some review comments for patch v9-0003
>
> ==
> Commit Message
>
> /fix/fixes/
Fixed

> ==
> 1.
> General. Is tablesync enough?
>
> I don't understand why is the patch only concerned about tablesync?
> Does it make sense to prevent VIRTUAL column replication during
> tablesync, if you aren't also going to prevent VIRTUAL columns from
> normal logical replication (e.g. when copy_data = false)? Or is this
> already handled somewhere?
I checked the behaviour during incremental changes. I saw during
decoding 'null' values are present for Virtual Generated Columns. I
made the relevant changes to not support replication of Virtual
generated columns.

> ~~~
>
> 2.
> General. Missing test.
>
> Add some test cases to verify behaviour is different for STORED versus
> VIRTUAL generated columns
I have not added the tests as it would give an error in cfbot.
I have added a TODO note for the same. This can be done once the
VIRTUAL generated columns patch is committted.

> ==
> src/sgml/ref/create_subscription.sgml
>
> NITPICK - consider rearranging as shown in my nitpicks diff
> NITPICK - use  sgml markup for STORED
Fixed

> ==
> src/backend/replication/logical/tablesync.c
>
> 3.
> - if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
> - walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) ||
> - !MySubscription->includegencols)
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) < 17)
> + {
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12)
>   appendStringInfo(, " AND a.attgenerated = ''");
> + }
> + else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 17)
> + {
> + if(MySubscription->includegencols)
> + appendStringInfo(, " AND a.attgenerated != 'v'");
> + else
> + appendStringInfo(, " AND a.attgenerated = ''");
> + }
>
> IMO this logic is too tricky to remain uncommented -- please add some 
> comments.
> Also, it seems somewhat complex. I think you can achieve the same more simply:
>
> SUGGESTION
>
> if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12)
> {
>   bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 
> 17
> && MySubscription->includegencols;
>   if (gencols_allowed)
>   {
> /* Replication of generated cols is supported, but not VIRTUAL cols. */
> appendStringInfo(, " AND a.attgenerated != 'v'");
>   }
>   else
>   {
> /* Replication of generated cols is not supported. */
> appendStringInfo(, " AND a.attgenerated = ''");
>   }
> }
Fixed

> ==
>
> 99.
> Please refer also to my attached nitpick diffs and apply those if you agree.
Applied the changes.

I have attached the updated patch v10 here in [1].
[1]: 
https://www.postgresql.org/message-id/CANhcyEUMCk6cCbw0vVZWo8FRd6ae9CmKG%3DgKP-9Q67jLn8HqtQ%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-23 Thread Peter Smith
Hi, here are some patch v9-0001 comments.

I saw Kuroda-san has already posted comments for this patch so there
may be some duplication here.

==
GENERAL

1.
The later patches 0002 etc are checking to support only STORED
gencols. But, doesn't that restriction belong in this patch 0001 so
VIRTUAL columns are not decoded etc in the first place... (??)

~~~

2.
The "Generated Columns" docs mentioned in my previous review comment
[2] should be modified by this 0001 patch.

~~~

3.
I think the "Message Format" page mentioned in my previous review
comment [3] should be modified by this 0001 patch.

==
Commit message

4.
The patch name is still broken as previously mentioned [1, #1]

==
doc/src/sgml/protocol.sgml

5.
Should this docs be referring to STORED generated columns, instead of
just generated columns?

==
doc/src/sgml/ref/create_subscription.sgml

6.
Should this be docs referring to STORED generated columns, instead of
just generated columns?

==
src/bin/pg_dump/pg_dump.c

getSubscriptions:
NITPICK - tabs
NITPICK - patch removed a blank line it should not be touching
NITPICK = patch altered indents it should not be touching
NITPICK - a missing blank line that was previously present

7.
+ else
+ appendPQExpBufferStr(query,
+ " false AS subincludegencols,\n");

There is an unwanted comma here.

~

dumpSubscription
NITPICK - patch altered indents it should not be touching

==
src/bin/pg_dump/pg_dump.h

NITPICK - unnecessary blank line

==
src/bin/psql/describe.c

describeSubscriptions
NITPICK - bad indentation

8.
In my previous review [1, #4b] I suggested this new column should be
in a different order (e.g. adjacent to the other ones ahead of
'Conninfo'), but this is not yet addressed.

==
src/test/subscription/t/011_generated.pl

NITPICK - missing space in comment
NITPICK - misleading "because" wording in the comment

==

99.
See also my attached nitpicks diff, for cosmetic issues. Please apply
whatever you agree with.

==
[1] My v8-0001 review -
https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1fb19f5..9f2cac9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4739,7 +4739,7 @@ getSubscriptions(Archive *fout)
int i_suboriginremotelsn;
int i_subenabled;
int i_subfailover;
-   int i_subincludegencols;
+   int i_subincludegencols;
int i,
ntups;
 
@@ -4770,6 +4770,7 @@ getSubscriptions(Archive *fout)
 " s.subowner,\n"
 " s.subconninfo, 
s.subslotname, s.subsynccommit,\n"
 " s.subpublications,\n");
+
if (fout->remoteVersion >= 14)
appendPQExpBufferStr(query, " s.subbinary,\n");
else
@@ -4804,7 +4805,7 @@ getSubscriptions(Archive *fout)
 
if (dopt->binary_upgrade && fout->remoteVersion >= 17)
appendPQExpBufferStr(query, " o.remote_lsn AS 
suboriginremotelsn,\n"
-   " s.subenabled,\n");
+" s.subenabled,\n");
else
appendPQExpBufferStr(query, " NULL AS suboriginremotelsn,\n"
 " false AS 
subenabled,\n");
@@ -4815,12 +4816,14 @@ getSubscriptions(Archive *fout)
else
appendPQExpBuffer(query,
" false AS subfailover,\n");
+
if (fout->remoteVersion >= 17)
appendPQExpBufferStr(query,
 " s.subincludegencols\n");
else
appendPQExpBufferStr(query,
" false AS 
subincludegencols,\n");
+
appendPQExpBufferStr(query,
 "FROM pg_subscription s\n");
 
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index a2c35fe..8c07933 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -672,7 +672,6 @@ typedef struct _SubscriptionInfo
char   *suboriginremotelsn;
char   *subfailover;
char   *subincludegencols;
-
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 491fcb9..00f3131 100644
--- a/src/bin/psql/describe.c

RE: Pgoutput not capturing the generated columns

2024-06-22 Thread Hayato Kuroda (Fujitsu)
Hi Shubham,

Thanks for sharing new patch! You shared as v9, but it should be v10, right?
Also, since there are no commitfest entry, I registered [1]. You can rename the
title based on the needs. Currently CFbot said OK.

Anyway, below are my comments.

01. General
Your patch contains unnecessary changes. Please remove all of them. E.g., 

```
 " s.subpublications,\n");
-
```
And
```
appendPQExpBufferStr(query, " o.remote_lsn AS 
suboriginremotelsn,\n"
-" s.subenabled,\n");
+   " s.subenabled,\n");
```

02. General
Again, please run the pgindent/pgperltidy.

03. test_decoding
Previously I suggested to the default value of to be include_generated_columns
should be true, so you modified like that. However, Peter suggested opposite
opinion [3] and you just revised accordingly. I think either way might be okay, 
but
at least you must clarify the reason why you preferred to set default to false 
and
changed accordingly.

04. decoding_into_rel.sql
According to the comment atop this file, this test should insert result to a 
table.
But added case does not - we should put them at another place. I.e., create 
another
file.

05. decoding_into_rel.sql
```
+-- when 'include-generated-columns' is not set
```
Can you clarify the expected behavior as a comment?

06. getSubscriptions
```
+   else
+   appendPQExpBufferStr(query,
+   " false AS 
subincludegencols,\n");
```
I think the comma is not needed.
Also, this error meant that you did not test to use pg_dump for instances prior 
PG16.
Please verify whether we can dump subscriptions and restore them accordingly.

[1]: https://commitfest.postgresql.org/48/5068/
[2]: 
https://www.postgresql.org/message-id/OSBPR01MB25529997E012DEABA8E15A02F5E52%40OSBPR01MB2552.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Pgoutput not capturing the generated columns

2024-06-22 Thread Shubham Khanna
On Fri, Jun 21, 2024 at 8:23 AM Peter Smith  wrote:
>
> Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> 1.
> Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
> don't think so.
>
> ~~~
>
> 2.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
> for readability and to avoid any surprises.

Both the comments are handled.

Patch v9-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2B6kwOGmn5MsRaTmciJDxtvNsyszMoPXV62OGPGzkxrCg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-06-22 Thread Shubham Khanna
On Thu, Jun 20, 2024 at 9:03 AM Peter Smith  wrote:
>
> Hi, here are my review comments for v8-0001.
>
> ==
> Commit message.
>
> 1.
> It seems like the patch name was accidentally omitted, so it became a
> mess when it defaulted to the 1st paragraph of the commit message.
>
> ==
> contrib/test_decoding/test_decoding.c
>
> 2.
> + data->include_generated_columns = true;
>
> I previously posted a comment [1, #4] that this should default to
> false; IMO it is unintuitive for the test_decoding to have an
> *opposite* default behaviour compared to CREATE SUBSCRIPTION.
>
> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - remove the inconsistent blank line in SGML
>
> ==
> src/backend/commands/subscriptioncmds.c
>
> 3.
> +#define SUBOPT_include_generated_columns 0x0001
>
> I previously posted a comment [1, #6] that this should be UPPERCASE,
> but it is not yet fixed.
>
> ==
> src/bin/psql/describe.c
>
> NITPICK - move and reword the bogus comment
>
> ~
>
> 4.
> + if (pset.sversion >= 17)
> + appendPQExpBuffer(,
> + ", subincludegencols AS \"%s\"\n",
> + gettext_noop("include_generated_columns"));
>
> 4a.
> For consistency with every other parameter, that column title should
> be written in words "Include generated columns" (not
> "include_generated_columns").
>
> ~
>
> 4b.
> IMO this new column belongs with the other subscription parameter
> columns (e.g. put it ahead of the "Conninfo" column).
>
> ==
> src/test/subscription/t/011_generated.pl
>
> NITPICK - fixed a comment
>
> 5.
> IMO, it would be better for readability if all the matching CREATE
> TABLE for publisher and subscriber are kept together, instead of the
> current code which is creating all publisher tables and then creating
> all subscriber tables.
>
> ~~~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> ...
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> 6a.
> These SELECT all need ORDER BY to protect against the SELECT *
> returning rows in some unexpected order.
>
> ~
>
> 6b.
> IMO there should be more comments here to explain how you can tell the
> column was NOT replicated. E.g. it is because the result value of 'b'
> is the subscriber-side computed value (which you made deliberately
> different to the publisher-side computed value).
>
> ==
>
> 99.
> Please also refer to the attached nitpicks top-up patch for minor
> cosmetic stuff.

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.


v9-0001-Currently-generated-column-values-are-not-replica.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-06-21 Thread Peter Smith
Hi Shubham/Shlok.

FYI, there is some other documentation page that mentions generated
column replication messages.

This page [1] says:
"Next, the following message part appears for each column included in
the publication (except generated columns):"

But, with the introduction of your new feature, I think that the
"except generated columns" wording is not strictly correct anymore.
IOW that docs page needs updating but AFAICT your patches have not
addressed this yet.

==
[1] https://www.postgresql.org/docs/17/protocol-logicalrep-message-formats.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-06-21 Thread Peter Smith
Hi, Here are some review comments for patch v9-0003

==
Commit Message

/fix/fixes/

==
1.
General. Is tablesync enough?

I don't understand why is the patch only concerned about tablesync?
Does it make sense to prevent VIRTUAL column replication during
tablesync, if you aren't also going to prevent VIRTUAL columns from
normal logical replication (e.g. when copy_data = false)? Or is this
already handled somewhere?

~~~

2.
General. Missing test.

Add some test cases to verify behaviour is different for STORED versus
VIRTUAL generated columns

==
src/sgml/ref/create_subscription.sgml

NITPICK - consider rearranging as shown in my nitpicks diff
NITPICK - use  sgml markup for STORED

==
src/backend/replication/logical/tablesync.c

3.
- if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
- walrcv_server_version(LogRepWorkerWalRcvConn) <= 16) ||
- !MySubscription->includegencols)
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) < 17)
+ {
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12)
  appendStringInfo(, " AND a.attgenerated = ''");
+ }
+ else if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 17)
+ {
+ if(MySubscription->includegencols)
+ appendStringInfo(, " AND a.attgenerated != 'v'");
+ else
+ appendStringInfo(, " AND a.attgenerated = ''");
+ }

IMO this logic is too tricky to remain uncommented -- please add some comments.
Also, it seems somewhat complex. I think you can achieve the same more simply:

SUGGESTION

if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12)
{
  bool gencols_allowed = walrcv_server_version(LogRepWorkerWalRcvConn) >= 17
&& MySubscription->includegencols;
  if (gencols_allowed)
  {
/* Replication of generated cols is supported, but not VIRTUAL cols. */
appendStringInfo(, " AND a.attgenerated != 'v'");
  }
  else
  {
/* Replication of generated cols is not supported. */
appendStringInfo(, " AND a.attgenerated = ''");
  }
}

==

99.
Please refer also to my attached nitpick diffs and apply those if you agree.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/create_subscription.sgml 
b/doc/src/sgml/ref/create_subscription.sgml
index 5666931..4ce89e9 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -433,16 +433,15 @@ CREATE SUBSCRIPTION subscription_nameinclude_generated_columns 
(boolean)
 
  
-  Specifies whether the generated columns present in the tables
-  associated with the subscription should be replicated.
+  Specifies whether the STORED generated columns 
present in
+  the tables associated with the subscription should be replicated.
   The default is false.
  
 
  
   If the subscriber-side column is also a generated column then this 
option
   has no effect; the subscriber column will be filled as normal with 
the
-  subscriber-side computed or default data. This option allows 
replication
-  of only STORED GENERATED COLUMNS.
+  subscriber-side computed or default data.
  
 



Re: Pgoutput not capturing the generated columns

2024-06-20 Thread Peter Smith
Hi Shubham/Shlok.

FYI,  my patch describing the current PG17 behaviour of logical
replication of generated columns was recently pushed [1].

Note that this will have some impact on your patch set. e.g. You are
changing the current replication behaviour, so the "Generated Columns"
section note will now need to be modified by your patches.

==
[1] 
https://github.com/postgres/postgres/commit/7a089f6e6a14ca3a5dc8822c393c6620279968b9
[2]

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-06-20 Thread Peter Smith
Hi, here are some review comments for patch v9-0002.

==
src/backend/replication/logical/relation.c

1. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencols && attr->attgenerated))

You replied to my question from the previous review [1, #2] as follows:
In case 'include_generated_columns' is 'true'. column list in
remoterel will have an entry for generated columns. So, in this case
if we skip every attr->attgenerated, we will get a missing column
error.

~

TBH, the reason seems very subtle to me. Perhaps that
"(!MySubscription->includegencols && attr->attgenerated))" condition
should be coded as a separate "if", so then you can include a comment
similar to your answer, to explain it.

==
src/backend/replication/logical/tablesync.c

make_copy_attnamelist:

NITPICK - punctuation in function comment
NITPICK - add/reword some more comments
NITPICK - rearrange comments to be closer to the code they are commenting

~

2. make_copy_attnamelist.

+ /*
+ * Construct column list for COPY.
+ */
+ for (int i = 0; i < rel->remoterel.natts; i++)
+ {
+ if(!gencollist[i])
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

IIUC isn't this assuming that the attribute number (aka column order)
is the same on the subscriber side (e.g. gencollist idx) and on the
publisher side (e.g. remoterel.attnames[i]).  AFAIK logical
replication does not require this ordering must be match like that,
therefore I am suspicious this new logic is accidentally imposing new
unwanted assumptions/restrictions. I had asked the same question
before [1-#4] about this code, but there was no reply.

Ideally, there would be more test cases for when the columns
(including the generated ones) are all in different orders on the
pub/sub tables.

~~~

3. General - varnames.

It would help with understanding if the 'attgenlist' variables in all
these functions are re-named to make it very clear that this is
referring to the *remote* (publisher-side) table genlist, not the
subscriber table genlist.

~~~

4.
+ int i = 0;
+
  appendStringInfoString(, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach_ptr(ListCell, l, attnamelist)
  {
- appendStringInfoString(, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(, quote_identifier(strVal(l)));
+ if (i < attnamelist->length - 1)
  appendStringInfoString(, ", ");
+ i++;
  }

4a.
I think the purpose of the new macros is to avoid using ListCell, and
also 'l' is an unhelpful variable name. Shouldn't this code be more
like:
foreach_node(String, att_name, attnamelist)

~

4b.
The code can be far simpler if you just put the comma (", ") always
up-front except the *first* iteration, so you can avoid checking the
list length every time. For example:

if (i++)
  appendStringInfoString(, ", ");

==
src/test/subscription/t/011_generated.pl

5. General.

Hmm. This patch 0002 included many formatting changes to tables tab2
and tab3 and subscriptions sub2 and sub3 but they do not belong here.
The proper formatting for those needs to be done back in patch 0001
where they were introduced. Patch 0002 should just concentrate only on
the new stuff for patch 0002.

~

6. CREATE TABLES would be better in pairs

IMO it will be better if the matching CREATE TABLE for pub and sub are
kept together, instead of separating them by doing all pub then all
sub. I previously made the same comment for patch 0001, so maybe it
will be addressed next time...

~

7. SELECT *

+$result =
+  $node_subscriber->safe_psql('postgres', "SELECT * FROM tab4 ORDER BY a");

It will be prudent to do explicit "SELECT a,b,c" instead of "SELECT
*", for readability and so there are no surprises.

==

99.
Please also refer to my attached nitpicks diff for numerous cosmetic
changes, and apply if you agree.

==
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtAsEc3PEB1KUk1kFF5tcCrDCCTcbboougO29vP1B4E2Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/replication/logical/tablesync.c 
b/src/backend/replication/logical/tablesync.c
index 0fbf661..ddeb6a8 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -694,7 +694,7 @@ process_syncing_tables(XLogRecPtr current_lsn)
 
 /*
  * Create list of columns for COPY based on logical relation mapping. Do not
- * include generated columns, of the subscription table, in the column list.
+ * include generated columns of the subscription table in the column list.
  */
 static List *
 make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool *attgenlist)
@@ -706,6 +706,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel, bool 
*attgenlist)
desc = RelationGetDescr(rel->localrel);
gencollist = palloc0(MaxTupleAttributeNumber * sizeof(bool));
 
+   /* Loop to handle subscription table generated columns. */
for (int i = 0; i < desc->natts; i++)
 

Re: Pgoutput not capturing the generated columns

2024-06-20 Thread Peter Smith
Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.

==
src/test/subscription/t/011_generated.pl

1.
Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
don't think so.

~~~

2.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
for readability and to avoid any surprises.

==
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Pgoutput not capturing the generated columns

2024-06-20 Thread Shlok Kyal
On Thu, 20 Jun 2024 at 12:52, vignesh C  wrote:
>
> On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut  wrote:
> >
> > On 19.06.24 13:22, Shubham Khanna wrote:
> > > All the comments are handled.
> > >
> > > The attached Patch contains all the suggested changes.
> >
> > Please also take a look at the proposed patch for virtual generated
> > columns [0] and consider how that would affect your patch.  I think your
> > feature can only replicate *stored* generated columns.  So perhaps the
> > documentation and terminology in your patch should reflect that.
>
> This patch is unable to manage virtual generated columns because it
> stores NULL values for them. Along with documentation the initial sync
> command being generated also should be changed to sync data
> exclusively for stored generated columns, omitting virtual ones. I
> suggest treating these changes as a separate patch(0003) for future
> merging or a separate commit, depending on the order of patch
> acceptance.
>

I have addressed the issue and updated the documentation accordingly.
And created a new 0003 patch.
Please refer to v9-0003 patch for the same in [1].

[1]: 
https://www.postgresql.org/message-id/CANhcyEXmjLEPNgOSAtjS4YGb9JvS8w-SO9S%2BjRzzzXo2RavNWw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-20 Thread vignesh C
On Wed, 19 Jun 2024 at 21:43, Peter Eisentraut  wrote:
>
> On 19.06.24 13:22, Shubham Khanna wrote:
> > All the comments are handled.
> >
> > The attached Patch contains all the suggested changes.
>
> Please also take a look at the proposed patch for virtual generated
> columns [0] and consider how that would affect your patch.  I think your
> feature can only replicate *stored* generated columns.  So perhaps the
> documentation and terminology in your patch should reflect that.

This patch is unable to manage virtual generated columns because it
stores NULL values for them. Along with documentation the initial sync
command being generated also should be changed to sync data
exclusively for stored generated columns, omitting virtual ones. I
suggest treating these changes as a separate patch(0003) for future
merging or a separate commit, depending on the order of patch
acceptance.

Regards,
Vignesh




Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Peter Smith
Hi, here are my review comments for v8-0001.

==
Commit message.

1.
It seems like the patch name was accidentally omitted, so it became a
mess when it defaulted to the 1st paragraph of the commit message.

==
contrib/test_decoding/test_decoding.c

2.
+ data->include_generated_columns = true;

I previously posted a comment [1, #4] that this should default to
false; IMO it is unintuitive for the test_decoding to have an
*opposite* default behaviour compared to CREATE SUBSCRIPTION.

==
doc/src/sgml/ref/create_subscription.sgml

NITPICK - remove the inconsistent blank line in SGML

==
src/backend/commands/subscriptioncmds.c

3.
+#define SUBOPT_include_generated_columns 0x0001

I previously posted a comment [1, #6] that this should be UPPERCASE,
but it is not yet fixed.

==
src/bin/psql/describe.c

NITPICK - move and reword the bogus comment

~

4.
+ if (pset.sversion >= 17)
+ appendPQExpBuffer(,
+ ", subincludegencols AS \"%s\"\n",
+ gettext_noop("include_generated_columns"));

4a.
For consistency with every other parameter, that column title should
be written in words "Include generated columns" (not
"include_generated_columns").

~

4b.
IMO this new column belongs with the other subscription parameter
columns (e.g. put it ahead of the "Conninfo" column).

==
src/test/subscription/t/011_generated.pl

NITPICK - fixed a comment

5.
IMO, it would be better for readability if all the matching CREATE
TABLE for publisher and subscriber are kept together, instead of the
current code which is creating all publisher tables and then creating
all subscriber tables.

~~~

6.
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'confirm generated columns ARE replicated when the
subscriber-side column is not generated');
+
...
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'confirm generated columns are NOT replicated when the
subscriber-side column is also generated');
+

6a.
These SELECT all need ORDER BY to protect against the SELECT *
returning rows in some unexpected order.

~

6b.
IMO there should be more comments here to explain how you can tell the
column was NOT replicated. E.g. it is because the result value of 'b'
is the subscriber-side computed value (which you made deliberately
different to the publisher-side computed value).

==

99.
Please also refer to the attached nitpicks top-up patch for minor
cosmetic stuff.

==
[1] 
https://www.postgresql.org/message-id/CAHv8RjLeZtTeXpFdoY6xCPO41HtuOPMSSZgshVdb%2BV%3Dp2YHL8Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/doc/src/sgml/ref/create_subscription.sgml 
b/doc/src/sgml/ref/create_subscription.sgml
index e8779dc..ee27a58 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -437,7 +437,6 @@ CREATE SUBSCRIPTION subscription_namefalse.
  
-
  
   If the subscriber-side column is also a generated column then this 
option
   has no effect; the subscriber column will be filled as normal with 
the
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 75a52ce..663015d 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6609,12 +6609,12 @@ describeSubscriptions(const char *pattern, bool verbose)
appendPQExpBuffer(,
  ", subskiplsn AS 
\"%s\"\n",
  gettext_noop("Skip 
LSN"));
+
+   /* include_generated_columns is only supported in v18 and 
higher */
if (pset.sversion >= 17)
appendPQExpBuffer(,
", 
subincludegencols AS \"%s\"\n",

gettext_noop("include_generated_columns"));
-
- // 
include_generated_columns
}
 
/* Only display subscriptions in current database. */
diff --git a/src/test/subscription/t/011_generated.pl 
b/src/test/subscription/t/011_generated.pl
index 92b3dbf..cbd5015 100644
--- a/src/test/subscription/t/011_generated.pl
+++ b/src/test/subscription/t/011_generated.pl
@@ -24,7 +24,7 @@ $node_publisher->safe_psql('postgres',
"CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
2) STORED)"
 );
 
-# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has 
NON-gnerated col 'b'.
+# publisher-side tab2 has generated col 'b' but subscriber-side tab2 has 
NON-generated col 'b'.
 $node_publisher->safe_psql('postgres',
"CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
2) STORED)"
 );


Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Peter Eisentraut

On 19.06.24 13:22, Shubham Khanna wrote:

All the comments are handled.

The attached Patch contains all the suggested changes.


Please also take a look at the proposed patch for virtual generated 
columns [0] and consider how that would affect your patch.  I think your 
feature can only replicate *stored* generated columns.  So perhaps the 
documentation and terminology in your patch should reflect that.



[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org






Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna
On Mon, Jun 17, 2024 at 1:57 PM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v7-0001.
>
> ==
> 1. GENERAL - \dRs+
>
> Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
> (e.g. \dRs+ mysub) the same as the other boolean parameters?
>
> ==
> Commit message
>
> 2.
> When 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list
>
> ~
>
> Maybe you don't need to mention "PUBLICATION col-list" twice.
>
> SUGGESTION
> When 'include_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> ~~~
>
> 2.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
> 'publication pub1;
>
> ~
>
> 2a.
> (I've questioned this one in previous reviews)
>
> What exactly is the purpose of this statement in the commit message?
> Was this supposed to demonstrate the usage of the
> 'include_generated_columns' parameter?
>
> ~
>
> 2b.
> /publication/ PUBLICATION/
>
>
> ~~~
>
> 3.
> If the subscriber-side column is also a generated column then
> thisoption has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.
>
> ~
>
> Missing space: /thisoption/this option/
>
> ==
> .../expected/decoding_into_rel.out
>
> 4.
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> Why is the default value here equivalent to
> 'include-generated-columns' = '1' here instead of '0'? The default for
> the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
> false, and IMO it seems confusing for these 2 defaults to be
> different. Here I think it should default to '0' *regardless* of what
> the previous functionality might have done -- e.g. this is a "test
> decoder" so the parameter should behave sensibly.
>
> ==
> .../test_decoding/sql/decoding_into_rel.sql
>
> NITPICK - wrong comments.
>
> ==
> doc/src/sgml/protocol.sgml
>
> 5.
> +
> + include_generated_columns
> +  
> +   
> +Boolean option to enable generated columns. This option controls
> +whether generated columns should be included in the string
> +representation of tuples during logical decoding in PostgreSQL.
> +The default is false.
> +   
> +  
> +
> +
>
> Does the protocol version need to be bumped to support this new option
> and should that be mentioned on this page similar to how all other
> version values are mentioned?

I already did the Backward Compatibility test earlier and decided that
protocol bump is not needed.

> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - some missing words/sentence.
> NITPICK - some missing  tags.
> NITPICK - remove duplicated sentence.
> NITPICK - add another .
>
> ==
> src/backend/commands/subscriptioncmds.c
>
> 6.
>  #define SUBOPT_ORIGIN 0x8000
> +#define SUBOPT_include_generated_columns 0x0001
>
> Please use UPPERCASE for consistency with other macros.
>
> ==
> .../libpqwalreceiver/libpqwalreceiver.c
>
> 7.
> + if (options->proto.logical.include_generated_columns &&
> + PQserverVersion(conn->streamConn) >= 17)
> + appendStringInfoString(, ", include_generated_columns 'on'");
> +
>
> IMO it makes more sense to say 'true' here instead of 'on'. It seems
> like this was just cut/paste from the above code (where 'on' was
> sensible).
>
> ==
> src/include/catalog/pg_subscription.h
>
> 8.
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegencol; /* True if generated columns must be published */
> +
>
> Not fixed as claimed. This field name ought to be plural.
>
> /subincludegencol/subincludegencols/
>
> ~~~
>
> 9.
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegencol; /* publish generated column data */
>  } Subscription;
>
> Not fixed as claimed. This field name ought to be plural.
>
> /includegencol/includegencols/
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 10.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 10) STORED)"
> +);
> +
> 

Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna
> Few comments:
> 1) Here tab1 and tab2 are exactly the same tables, just check if the
> table tab1 itself can be used for your tests.
> @@ -24,20 +24,50 @@ $node_publisher->safe_psql('postgres',
> "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
>  );
> +$node_publisher->safe_psql('postgres',
> +   "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED)"
> +);

On the subscription side the tables have different descriptions, so we
need to have different tables on the publisher side.

> 2) We can document  that the include_generate_columns option cannot be 
> altered.
>
> 3) You can mention that include-generated-columns is true by default
> and generated column data will be selected
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
>
> 4)  The comment seems to be wrong here, the comment says b will not be
> replicated but b is being selected:
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> data
> -
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
>  table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
>  table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
>  COMMIT
> (5 rows)
>
> 5)  Similarly here too the comment seems to be wrong, the comment says
> b will not replicated but b is not being selected:
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>   data
> 
>  BEGIN
>  table public.gencoltable: INSERT: a[integer]:4
>  table public.gencoltable: INSERT: a[integer]:5
>  table public.gencoltable: INSERT: a[integer]:6
>  COMMIT
> (5 rows)
>
> 6) SUBOPT_include_generated_columns change it to SUBOPT_GENERATED to
> keep the name consistent:
> --- a/src/backend/commands/subscriptioncmds.c
> +++ b/src/backend/commands/subscriptioncmds.c
> @@ -72,6 +72,7 @@
>  #define SUBOPT_FAILOVER0x2000
>  #define SUBOPT_LSN 0x4000
>  #define SUBOPT_ORIGIN  0x8000
> +#define SUBOPT_include_generated_columns   0x0001
>
> 7) The comment style seems to be inconsistent, both of them can start
> in lower case
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int, b int GENERATED ALWAYS AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +-- When 'include-generated-columns' is not set
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> +
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
>
> 8) This could be changed to remove the insert statements by using
> pg_logical_slot_peek_changes:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 

Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna
> Hi Shubham, thanks for providing a patch.
> I have some comments for v6-0001.
>
> 1. create_subscription.sgml
> There is repetition of the same line.
>
> + 
> +  Specifies whether the generated columns present in the tables
> +  associated with the subscription should be replicated. If the
> +  subscriber-side column is also a generated column then this option
> +  has no effect; the replicated data will be ignored and the 
> subscriber
> +  column will be filled as normal with the subscriber-side computed 
> or
> +  default data.
> +  false.
> + 
> +
> + 
> +  This parameter can only be set true if
> copy_data is
> +  set to false. If the subscriber-side
> column is also a
> +  generated column then this option has no effect; the
> replicated data will
> +  be ignored and the subscriber column will be filled as
> normal with the
> +  subscriber-side computed or default data.
> + 
>
> ==
> 2. subscriptioncmds.c
>
> 2a. The macro name should be in uppercase. We can use a short name
> like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
> +#define SUBOPT_include_generated_columns 0x0001
>
> 2b.Update macro name accordingly
> + if (IsSet(supported_opts, SUBOPT_include_generated_columns))
> + opts->include_generated_columns = false;
>
> 2c. Update macro name accordingly
> + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
> + strcmp(defel->defname, "include_generated_columns") == 0)
> + {
> + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
> + errorConflictingDefElem(defel, pstate);
> +
> + opts->specified_opts |= SUBOPT_include_generated_columns;
> + opts->include_generated_columns = defGetBoolean(defel);
> + }
>
> 2d. Update macro name accordingly
> +   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
> +   SUBOPT_include_generated_columns);
>
>
> ==
>
> 3. decoding_into_rel.out
>
> 3a. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will be replicated"
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
>
> 3b. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will not be replicated"
> +-- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +  data
> +
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:4
> + table public.gencoltable: INSERT: a[integer]:5
> + table public.gencoltable: INSERT: a[integer]:6
> + COMMIT
> +(5 rows)
>
> =
>
> 4. Here names for both the tests are the same. I think we should use
> different names.
>
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.


v8-0001-Currently-generated-column-values-are-not-replica.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-06-17 Thread Peter Smith
Hi, here are my review comments for patch v7-0002

==
Commit Message

NITPICKS
- rearrange paragraphs
- typo "donot"
- don't start a sentence with "And"
- etc.

Please see the attachment for my suggested commit message text updates
and take from it whatever you agree with.

==
doc/src/sgml/ref/create_subscription.sgml

1.
+  If the subscriber-side column is also a generated column
then this option
+  has no effect; the replicated data will be ignored and the subscriber
+  column will be filled as normal with the subscriber-side computed or
+  default data. And during table synchronization, the data
corresponding to
+  the generated column on subscriber-side will not be sent from the
+  publisher to the subscriber.

This text already mentions subscriber-side generated cols. IMO you
don't need to say anything at all about table synchronization --
that's just an internal code optimization, which is not something the
user needs to know about. IOW, the entire last sentence ("And
during...") should be removed.

==
src/backend/replication/logical/relation.c

2. logicalrep_rel_open

- if (attr->attisdropped)
+ if (attr->attisdropped ||
+ (!MySubscription->includegencol && attr->attgenerated))
  {
  entry->attrmap->attnums[i] = -1;
  continue;

~

Maybe I'm mistaken, but isn't this code for skipping checking for
"missing" subscriber-side (aka local) columns? Can't it just
unconditionally skip every attr->attgenerated -- i.e. why does it
matter if the MySubscription->includegencol was set or not?

==
src/backend/replication/logical/tablesync.c

3. make_copy_attnamelist

- for (i = 0; i < rel->remoterel.natts; i++)
+ desc = RelationGetDescr(rel->localrel);
+
+ for (i = 0; i < desc->natts; i++)
  {
- attnamelist = lappend(attnamelist,
-   makeString(rel->remoterel.attnames[i]));
+ int attnum;
+ Form_pg_attribute attr = TupleDescAttr(desc, i);
+
+ if (!attr->attgenerated)
+ continue;
+
+ attnum = logicalrep_rel_att_by_name(>remoterel,
+ NameStr(attr->attname));
+
+ /*
+ * Check if subscription table have a generated column with same
+ * column name as a non-generated column in the corresponding
+ * publication table.
+ */
+ if (attnum >=0 && !attgenlist[attnum])
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("logical replication target relation \"%s.%s\" is missing
replicated column: \"%s\"",
+ rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;
+
+ if (attnum >= 0)
+ gencollist = lappend_int(gencollist, attnum);
  }

~

NITPICK - Use C99-style for loop variables
NITPICK - Typo in comment
NITPICK - spaces

~

3a.
I think above code should be refactored so there is only one check for
"if (attnum >= 0)" -- e.g. other condition should be nested.

~

3b.
That ERROR message says "missing replicated column", but that doesn't
seem much like what the code-comment was saying this code is about.

~~~

4.
+ for (i = 0; i < rel->remoterel.natts; i++)
+ {
+
+ if (gencollist != NIL && j < gencollist->length &&
+ list_nth_int(gencollist, j) == i)
+ j++;
+ else
+ attnamelist = lappend(attnamelist,
+   makeString(rel->remoterel.attnames[i]));
+ }

NITPICK - Use C99-style for loop variables
NITPICK - Unnecessary blank lines

~

IIUC the subscriber-side table and the publisher-side table do NOT
have to have all the columns in identical order for the logical
replication to work correcly. AFAIK it works fine so long as the
column names match for the replicated columns. Therefore, I am
suspicious that this new patch code seems to be imposing some new
ordering assumptions/restrictions (e.g. list_nth_int stuff) which are
not current requirements.

~~~

copy_table:

NITPICK - comment typo
NITPICK - comment wording

~

5.
+ int i = 0;
+ ListCell *l;
+
  appendStringInfoString(, "COPY (SELECT ");
- for (int i = 0; i < lrel.natts; i++)
+ foreach(l, attnamelist)
  {
- appendStringInfoString(, quote_identifier(lrel.attnames[i]));
- if (i < lrel.natts - 1)
+ appendStringInfoString(, quote_identifier(strVal(lfirst(l;
+ if (i < attnamelist->length - 1)
  appendStringInfoString(, ", ");
+ i++;
  }
IIUC for new code like this, it is preferred to use the foreach*
macros instead of ListCell.

==
src/test/regress/sql/subscription.sql

6.
--- fail - copy_data and include_generated_columns are mutually
exclusive options
-CREATE SUBSCRIPTION sub2 CONNECTION 'dbname=regress_doesnotexist'
PUBLICATION testpub WITH (include_generated_columns = true);
-ERROR:  copy_data = true and include_generated_columns = true are
mutually exclusive options

It is OK to delete this test now but IMO still needs to be some
"include_generated_columns must be boolean" test cases (e.g. same as
there was two_phase). Actually, this should probably be done by the
0001 patch.

==
src/test/subscription/t/011_generated.pl

7.
All the PRIMARY KEY stuff may be overkill. Are primary keys really
needed for these tests?

~~~

8.

Re: Pgoutput not capturing the generated columns

2024-06-17 Thread Peter Smith
Hi, here are my review comments for patch v7-0001.

==
1. GENERAL - \dRs+

Shouldn't the new SUBSCRIPTION parameter be exposed via "describe"
(e.g. \dRs+ mysub) the same as the other boolean parameters?

==
Commit message

2.
When 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list

~

Maybe you don't need to mention "PUBLICATION col-list" twice.

SUGGESTION
When 'include_generated_columns' is false, generated columns are not
replicated, even when present in a PUBLICATION col-list.

~~~

2.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
'publication pub1;

~

2a.
(I've questioned this one in previous reviews)

What exactly is the purpose of this statement in the commit message?
Was this supposed to demonstrate the usage of the
'include_generated_columns' parameter?

~

2b.
/publication/ PUBLICATION/


~~~

3.
If the subscriber-side column is also a generated column then
thisoption has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

~

Missing space: /thisoption/this option/

==
.../expected/decoding_into_rel.out

4.
+-- When 'include-generated-columns' is not set
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data
+-
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)

Why is the default value here equivalent to
'include-generated-columns' = '1' here instead of '0'? The default for
the CREATE SUBSCRIPTION parameter 'include_generated_columns' is
false, and IMO it seems confusing for these 2 defaults to be
different. Here I think it should default to '0' *regardless* of what
the previous functionality might have done -- e.g. this is a "test
decoder" so the parameter should behave sensibly.

==
.../test_decoding/sql/decoding_into_rel.sql

NITPICK - wrong comments.

==
doc/src/sgml/protocol.sgml

5.
+
+ include_generated_columns
+  
+   
+Boolean option to enable generated columns. This option controls
+whether generated columns should be included in the string
+representation of tuples during logical decoding in PostgreSQL.
+The default is false.
+   
+  
+
+

Does the protocol version need to be bumped to support this new option
and should that be mentioned on this page similar to how all other
version values are mentioned?

==
doc/src/sgml/ref/create_subscription.sgml

NITPICK - some missing words/sentence.
NITPICK - some missing  tags.
NITPICK - remove duplicated sentence.
NITPICK - add another .

==
src/backend/commands/subscriptioncmds.c

6.
 #define SUBOPT_ORIGIN 0x8000
+#define SUBOPT_include_generated_columns 0x0001

Please use UPPERCASE for consistency with other macros.

==
.../libpqwalreceiver/libpqwalreceiver.c

7.
+ if (options->proto.logical.include_generated_columns &&
+ PQserverVersion(conn->streamConn) >= 17)
+ appendStringInfoString(, ", include_generated_columns 'on'");
+

IMO it makes more sense to say 'true' here instead of 'on'. It seems
like this was just cut/paste from the above code (where 'on' was
sensible).

==
src/include/catalog/pg_subscription.h

8.
@@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW
  * slots) in the upstream database are enabled
  * to be synchronized to the standbys. */

+ bool subincludegencol; /* True if generated columns must be published */
+

Not fixed as claimed. This field name ought to be plural.

/subincludegencol/subincludegencols/

~~~

9.
  char*origin; /* Only publish data originating from the
  * specified origin */
+ bool includegencol; /* publish generated column data */
 } Subscription;

Not fixed as claimed. This field name ought to be plural.

/includegencol/includegencols/

==
src/test/subscription/t/031_column_list.pl

10.
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) STORED)"
+);
+
+$node_publisher->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 10) STORED)"
+);
+
 $node_subscriber->safe_psql('postgres',
  "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 22) STORED, c int)"
 );

+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
+);
+
+$node_subscriber->safe_psql('postgres',
+ "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
+ 20) STORED)"
+);

IMO the test needs lots more comments to describe what it is doing:

For example, the setup 

Re: Pgoutput not capturing the generated columns

2024-06-17 Thread vignesh C
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
 wrote:
>
> > Thanks for the updated patch, few comments:
> > 1) The option name seems wrong here:
> > In one place include_generated_column is specified and other place
> > include_generated_columns is specified:
> >
> > +   else if (IsSet(supported_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> > +strcmp(defel->defname,
> > "include_generated_column") == 0)
> > +   {
> > +   if (IsSet(opts->specified_opts,
> > SUBOPT_INCLUDE_GENERATED_COLUMN))
> > +   errorConflictingDefElem(defel, pstate);
> > +
> > +   opts->specified_opts |= 
> > SUBOPT_INCLUDE_GENERATED_COLUMN;
> > +   opts->include_generated_column = 
> > defGetBoolean(defel);
> > +   }
>
> Fixed.
>
> > diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> > index d453e224d9..e8ff752fd9 100644
> > --- a/src/bin/psql/tab-complete.c
> > +++ b/src/bin/psql/tab-complete.c
> > @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> > COMPLETE_WITH("binary", "connect", "copy_data", 
> > "create_slot",
> >   "disable_on_error",
> > "enabled", "failover", "origin",
> >   "password_required",
> > "run_as_owner", "slot_name",
> > - "streaming",
> > "synchronous_commit", "two_phase");
> > + "streaming",
> > "synchronous_commit", "two_phase","include_generated_columns");
> >
> > 2) This small data table need not have a primary key column as it will
> > create an index and insertion will happen in the index too.
> > +-- check include-generated-columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include-generated-columns', '1');
>
> Fixed.
>
> > 3) Please add a test case for this:
> > +  set to false. If the subscriber-side
> > column is also a
> > +  generated column then this option has no effect; the
> > replicated data will
> > +  be ignored and the subscriber column will be filled as
> > normal with the
> > +  subscriber-side computed or default data.
>
> Added the required test case.
>
> > 4) You can use a new style of ereport to remove the brackets around errcode
> > 4.a)
> > +   else if (!parse_bool(strVal(elem->arg),
> > >include_generated_columns))
> > +   ereport(ERROR,
> > +
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > +errmsg("could not
> > parse value \"%s\" for parameter \"%s\"",
> > +
> > strVal(elem->arg), elem->defname)));
> >
> > 4.b) similarly here too:
> > +   ereport(ERROR,
> > +   (errcode(ERRCODE_SYNTAX_ERROR),
> > +   /*- translator: both %s are strings of the form
> > "option = value" */
> > +   errmsg("%s and %s are mutually
> > exclusive options",
> > +   "copy_data = true",
> > "include_generated_column = true")));
> >
> > 4.c) similarly here too:
> > +   if (include_generated_columns_option_given)
> > +   ereport(ERROR,
> > +   
> > (errcode(ERRCODE_SYNTAX_ERROR),
> > +errmsg("conflicting
> > or redundant options")));
>
> Fixed.
>
> > 5) These variable names can be changed to keep it smaller, something
> > like gencol or generatedcol or gencolumn, etc
> > +++ b/src/include/catalog/pg_subscription.h
> > @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> > BKI_SHARED_RELATION BKI_ROW
> >   * slots) in the upstream database are enabled
> >   * to be synchronized to the standbys. */
> >
> > + bool subincludegeneratedcolumn; /* True if generated columns must be
> > published */
> > +
> >  #ifdef CATALOG_VARLEN /* variable-length fields start here */
> >   /* Connection string to the publisher */
> >   text subconninfo BKI_FORCE_NOT_NULL;
> > @@ -157,6 +159,7 @@ typedef struct Subscription
> >   List*publications; /* List of publication names to subscribe to */
> >   char*origin; /* Only publish data originating from the
> >   * specified origin */
> > + bool includegeneratedcolumn; /* publish generated column data */
> >  } Subscription;
>
> Fixed.
>
> The attached Patch contains the suggested changes.

Few comments:
1) Here tab1 and tab2 are exactly the same tables, just check if the
table tab1 itself can be used for your tests.

Re: Pgoutput not capturing the generated columns

2024-06-17 Thread Shlok Kyal
On Fri, 14 Jun 2024 at 15:52, Shubham Khanna
 wrote:
>
> The attached Patch contains the suggested changes.
>

Hi Shubham, thanks for providing a patch.
I have some comments for v6-0001.

1. create_subscription.sgml
There is repetition of the same line.

+ 
+  Specifies whether the generated columns present in the tables
+  associated with the subscription should be replicated. If the
+  subscriber-side column is also a generated column then this option
+  has no effect; the replicated data will be ignored and the subscriber
+  column will be filled as normal with the subscriber-side computed or
+  default data.
+  false.
+ 
+
+ 
+  This parameter can only be set true if
copy_data is
+  set to false. If the subscriber-side
column is also a
+  generated column then this option has no effect; the
replicated data will
+  be ignored and the subscriber column will be filled as
normal with the
+  subscriber-side computed or default data.
+ 

==
2. subscriptioncmds.c

2a. The macro name should be in uppercase. We can use a short name
like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
+#define SUBOPT_include_generated_columns 0x0001

2b.Update macro name accordingly
+ if (IsSet(supported_opts, SUBOPT_include_generated_columns))
+ opts->include_generated_columns = false;

2c. Update macro name accordingly
+ else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
+ strcmp(defel->defname, "include_generated_columns") == 0)
+ {
+ if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
+ errorConflictingDefElem(defel, pstate);
+
+ opts->specified_opts |= SUBOPT_include_generated_columns;
+ opts->include_generated_columns = defGetBoolean(defel);
+ }

2d. Update macro name accordingly
+   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+   SUBOPT_include_generated_columns);


==

3. decoding_into_rel.out

3a. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will be replicated"
+-- When 'include-generated-columns' = '1' the generated column 'b'
values will not be replicated
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+data
+-
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT

3b. In comment, I think it should be "When 'include-generated-columns'
= '1' the generated column 'b' values will not be replicated"
+-- When 'include-generated-columns' = '0' the generated column 'b'
values will be replicated
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+  data
+
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:4
+ table public.gencoltable: INSERT: a[integer]:5
+ table public.gencoltable: INSERT: a[integer]:6
+ COMMIT
+(5 rows)

=

4. Here names for both the tests are the same. I think we should use
different names.

+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
+is( $result, qq(4|8
+5|10), 'generated columns replicated to non-generated column on subscriber');
+
+$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
+
+$node_publisher->wait_for_catchup('sub3');
+
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
+is( $result, qq(4|24
+5|25), 'generated columns replicated to non-generated column on subscriber');

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-15 Thread Shlok Kyal
On Thu, 6 Jun 2024 at 08:29, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shlok and Shubham,
>
> Thanks for updating the patch!
>
> I briefly checked the v5-0002. IIUC, your patch allows to copy generated
> columns unconditionally. I think the behavior affects many people so that it 
> is
> hard to get agreement.
>
> Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is 
> set
> to off, we can keep the current specification.
>
> Thought?
Hi Kuroda-san,

I agree that we should not allow to copy generated columns unconditionally.
With patch v7-0002, I have used a different approach which does not
require any code changes in COPY.

Please refer [1] for patch v7-0002.
[1]: 
https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-15 Thread Shlok Kyal
On Wed, 5 Jun 2024 at 05:49, Peter Smith  wrote:
>
> On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal  wrote:
> >
> > >
> > > The attached Patch contains the suggested changes.
> > >
> >
> > Hi,
> >
> > Currently, COPY command does not work for generated columns and
> > therefore, COPY of generated column is not supported during tablesync
> > process. So, in patch v4-0001 we added a check to allow replication of
> > the generated column only if 'copy_data = false'.
> >
> > I am attaching patches to resolve the above issues.
> >
> > v5-0001: not changed
> > v5-0002: Support COPY of generated column
> > v5-0003: Support COPY of generated column during tablesync process
> >
>
> Hi Shlok, I have a question about patch v5-0003.
>
> According to the patch 0001 docs "If the subscriber-side column is
> also a generated column then this option has no effect; the replicated
> data will be ignored and the subscriber column will be filled as
> normal with the subscriber-side computed or default data".
>
> Doesn't this mean it will be a waste of effort/resources to COPY any
> column value where the subscriber-side column is generated since we
> know that any copied value will be ignored anyway?
>
> But I don't recall seeing any comment or logic for this kind of copy
> optimisation in the patch 0003. Is this already accounted for
> somewhere and I missed it, or is my understanding wrong?
Your understanding is correct.
With v7-0002, if a subscriber-side column is generated, then we do not
include that column in the column list during COPY. This will address
the above issue.

Patch 7-0002 contains all the changes. Please refer [1]
[1]: 
https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-15 Thread Shlok Kyal
On Tue, 4 Jun 2024 at 15:01, Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0003.
>
> ==
> 0. Whitespace warnings when the patch was applied.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
> trailing whitespace.
>   has no effect; the replicated data will be ignored and the 
> subscriber
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
> trailing whitespace.
>   column will be filled as normal with the subscriber-side computed or
> ../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
> trailing whitespace.
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
> warning: 3 lines add whitespace errors.
>
Fixed

> ==
> src/backend/commands/subscriptioncmds.c
>
> 1.
> - res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
> + column_count = (!include_generated_column && check_gen_col) ? 4 :
> (check_columnlist ? 3 : 2);
> + res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);
>
> The 'column_count' seems out of control. Won't it be far simpler to
> assign/increment the value dynamically only as required instead of the
> tricky calculation at the end which is unnecessarily difficult to
> understand?
>
I have removed this piece of code.

> ~~~
>
> 2.
> + /*
> + * If include_generated_column option is false and all the column of
> the table in the
> + * publication are generated then we should throw an error.
> + */
> + if (!isnull && !include_generated_column && check_gen_col)
> + {
> + attlist = DatumGetArrayTypeP(attlistdatum);
> + gen_col_count = DatumGetInt32(slot_getattr(slot, 4, ));
> + Assert(!isnull);
> +
> + attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
> +
> + if (attcount != 0 && attcount == gen_col_count)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use only generated column for table \"%s.%s\" in
> publication when generated_column option is false",
> +nspname, relname));
> + }
> +
>
> Why do you think this new logic/error is necessary?
>
> IIUC the 'include_generated_columns' should be false to match the
> existing HEAD behavior. So this scenario where your publisher-side
> table *only* has generated columns is something that could already
> happen, right? IOW, this introduced error could be a candidate for
> another discussion/thread/patch, but is it really required for this
> current patch?
>
Yes, this scenario can also happen in HEAD. For this patch I have
removed this check.

> ==
> src/backend/replication/logical/tablesync.c
>
> 3.
>   lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ?
> -   "AND a.attgenerated = ''" : ""),
> + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
> + (walrcv_server_version(LogRepWorkerWalRcvConn) <= 16 ||
> + !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),
>
> This ternary within one big appendStringInfo seems quite complicated.
> Won't it be better to split the appendStringInfo into multiple parts
> so the generated-cols calculation can be done more simply?
>
Fixed

> ==
> src/test/subscription/t/011_generated.pl
>
> 4.
> I think there should be a variety of different tablesync scenarios
> (when 'include_generated_columns' is true) tested here instead of just
> one, and all varieties with lots of comments to say what they are
> doing, expectations etc.
>
> a. publisher-side gen-col "a" replicating to subscriber-side NOT
> gen-col "a" (ok, value gets replicated)
> b. publisher-side gen-col "a" replicating to subscriber-side gen-col
> (ok, but ignored)
> c. publisher-side NOT gen-col "b" replicating to subscriber-side
> gen-col "b" (error?)
>
Added the tests

> ~~
>
> 5.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
> +is( $result, qq(1|2
> +2|4
> +3|6), 'generated columns initial sync with include_generated_column = true');
>
> Should this say "ORDER BY..." so it will not fail if the row order
> happens to be something unanticipated?
>
Fixed

> ==
>
> 99.
> Also, see the attached file with numerous other nitpicks:
> - plural param- and var-names
> - typos in comments
> - missing spaces
> - SQL keyword should be UPPERCASE
> - etc.
>
> Please apply any/all of these if you agree with them.
Fixed

Patch 7-0002 contains all the changes. Please refer [1]
[1]: 
https://www.postgresql.org/message-id/CANhcyEUz0FcyR3T76b%2BNhtmvWO7o96O_oEwsLZNZksEoPmVzXw%40mail.gmail.com

Thanks and Regards,
Shlok Kyal




Re: Pgoutput not capturing the generated columns

2024-06-15 Thread Shlok Kyal
On Tue, 4 Jun 2024 at 10:21, Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0002.
>
> ==
> GENERAL
>
> G1.
> IIUC now you are unconditionally allowing all generated columns to be copied.
>
> I think this is assuming that the table sync code (in the next patch
> 0003?) is going to explicitly name all the columns it wants to copy
> (so if it wants to get generated cols then it will name the generated
> cols, and if is doesn't want generated cols then it won't name them).
>
> Maybe that is OK for the logical replication tablesync case, but I am
> not sure if it will be desirable to *always* copy generated columns in
> other user scenarios.
>
> e.g. I was wondering if there should be a new COPY command option
> introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
> then the current HEAD behaviour is unaffected unless that option is
> enabled.
>
> ~~~
>
> G2.
> The current COPY command documentation [1] says "If no column list is
> specified, all columns of the table except generated columns will be
> copied."
>
> But this 0002 patch has changed that documented behaviour, and so the
> documentation needs to be changed as well, right?
>
> ==
> Commit Message
>
> 1.
> Currently COPY command do not copy generated column. With this commit
> added support for COPY for generated column.
>
> ~
>
> The grammar/cardinality is not good here. Try some tool (Grammarly or
> chatGPT, etc) to help correct it.
>
> ==
> src/backend/commands/copy.c
>
> ==
> src/test/regress/expected/generated.out
>
> ==
> src/test/regress/sql/generated.sql
>
> 2.
> I think these COPY test cases require some explicit comments to
> describe what they are doing, and what are the expected results.
>
> Currently, I have doubts about some of this test input/output
>
> e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
> explanation. Are you expecting this generated col value to be
> ignored/overwritten or what?
>
> COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
> 5 1
> 6 1
> \.
>
> e.g.2. what is the reason for this new 'missing data for column "b"'
> error? Or is it some introduced quirk because "b" now cannot be
> generated since there is no value for "a"? I don't know if the
> expected *.out here is OK or not, so some test comments may help to
> clarify it.
>
> ==
> [1] https://www.postgresql.org/docs/devel/sql-copy.html
>
Hi Peter,

I have removed the changes in the COPY command. I came up with an
approach which requires changes only in tablesync code. We can COPY
generated columns during tablesync using syntax 'COPY (SELECT
column_name from table) TO STDOUT.'

I have attached the patch for the same.
v7-0001 : Not Modified
v7-0002: Support replication of generated columns during initial sync.

Thanks and Regards,
Shlok Kyal


v7-0002-Support-replication-of-generated-column-during-in.patch
Description: Binary data


v7-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna
On Tue, Jun 4, 2024 at 8:12 AM Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for patch v5-0001.
>
> ==
> GENERAL G.1
>
> The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
> maybe before they were always ignored, but now they are not?
>
> OTOH, when 'include_generated_columns' is false then the PUBLICATION
> col-list will ignore any generated cols even when they are present in
> a PUBLICATION col-list, right?
>
> These kinds of points should be noted in the commit message and in the
> (col-list?) documentation.

Fixed.

> ==
> Commit message
>
> General 1a.
> IMO the commit message needs some background to say something like:
> "Currently generated column values are not replicated because it is
> assumed that the corresponding subscriber-side table will generate its
> own values for those columns."
>
> ~
>
> General 1b.
> Somewhere in this commit message, you need to give all the other
> special rules --- e.g. the docs says "If the subscriber-side column is
> also a generated column then this option has no effect"
>
> ~~~

Fixed.

> 2.
> This commit enables support for the 'include_generated_columns' option
> in logical replication, allowing the transmission of generated column
> information and data alongside regular table changes. This option is
> particularly useful for scenarios where applications require access to
> generated column values for downstream processing or synchronization.
>
> ~
>
> I don't think the sentence "This option is particularly useful..." is
> helpful. It seems like just saying "This commit supports option XXX.
> This is particularly useful if you want XXX".
>

Fixed.

>
> 3.
> CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
> 'publication pub1;
>
> ~
>
> What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
> the new parameter being used in it?
>

Added the description for this in the Patch.

>
> 4.
> Currently copy_data option with include_generated_columns option is
> not supported. A future patch will remove this limitation.
>
> ~
>
> Suggest to single-quote those parameter names for better readability.
>

Fixed.

>
> 5.
> This commit aims to enhance the flexibility and utility of logical
> replication by allowing users to include generated column information
> in replication streams, paving the way for more robust data
> synchronization and processing workflows.
>
> ~
>
> IMO this paragraph can be omitted.

Fixed.

> ==
> .../test_decoding/sql/decoding_into_rel.sql
>
> 6.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +DROP TABLE gencoltable;
> +
>
> 6a.
> I felt some additional explicit comments might help the readabilty of
> the output file.
>
> e.g.1
> -- When 'include-generated=columns' = '1' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes...
>
> e.g.2
> -- When 'include-generated=columns' = '0' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_get_changes...

Added the required description for this.

> 6b.
> Suggest adding one more test case (where 'include-generated=columns'
> is not set) to confirm/demonstrate the default behaviour for
> replicated generated cols.

Added the required Test case.

> ==
> doc/src/sgml/protocol.sgml
>
> 7.
> +
> +  class="parameter">include-generated-columns
> +  
> +   
> +Boolean option to enable generated columns.
> +The include-generated-columns option controls whether generated
> +columns should be included in the string representation of tuples
> +during logical decoding in PostgreSQL. This allows users to
> +customize the output format based on whether they want to include
> +these columns or not. The default is false.
> +   
> +  
> +
>
> 7a.
> It doesn't render properly. e.g. Should not be bold italic (probably
> the class is wrong?), because none of the nearby parameters look this
> way.
>
> ~
>
> 7b.
> The name here should NOT have hyphens. It needs underscores same as
> all other nearby protocol parameters.
>
> ~
>
> 7c.
> The description seems overly verbose.
>
> SUGGESTION
> Boolean option to enable generated columns. This option controls
> whether generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ==
> 

Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna
> Thanks for the updated patch, few comments:
> 1) The option name seems wrong here:
> In one place include_generated_column is specified and other place
> include_generated_columns is specified:
>
> +   else if (IsSet(supported_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> +strcmp(defel->defname,
> "include_generated_column") == 0)
> +   {
> +   if (IsSet(opts->specified_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN))
> +   errorConflictingDefElem(defel, pstate);
> +
> +   opts->specified_opts |= 
> SUBOPT_INCLUDE_GENERATED_COLUMN;
> +   opts->include_generated_column = defGetBoolean(defel);
> +   }

Fixed.

> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index d453e224d9..e8ff752fd9 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>   "disable_on_error",
> "enabled", "failover", "origin",
>   "password_required",
> "run_as_owner", "slot_name",
> - "streaming",
> "synchronous_commit", "two_phase");
> + "streaming",
> "synchronous_commit", "two_phase","include_generated_columns");
>
> 2) This small data table need not have a primary key column as it will
> create an index and insertion will happen in the index too.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');

Fixed.

> 3) Please add a test case for this:
> +  set to false. If the subscriber-side
> column is also a
> +  generated column then this option has no effect; the
> replicated data will
> +  be ignored and the subscriber column will be filled as
> normal with the
> +  subscriber-side computed or default data.

Added the required test case.

> 4) You can use a new style of ereport to remove the brackets around errcode
> 4.a)
> +   else if (!parse_bool(strVal(elem->arg),
> >include_generated_columns))
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
>
> 4.b) similarly here too:
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +   /*- translator: both %s are strings of the form
> "option = value" */
> +   errmsg("%s and %s are mutually
> exclusive options",
> +   "copy_data = true",
> "include_generated_column = true")));
>
> 4.c) similarly here too:
> +   if (include_generated_columns_option_given)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("conflicting
> or redundant options")));

Fixed.

> 5) These variable names can be changed to keep it smaller, something
> like gencol or generatedcol or gencolumn, etc
> +++ b/src/include/catalog/pg_subscription.h
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -157,6 +159,7 @@ typedef struct Subscription
>   List*publications; /* List of publication names to subscribe to */
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;

Fixed.

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.


v6-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data


RE: Pgoutput not capturing the generated columns

2024-06-05 Thread Hayato Kuroda (Fujitsu)
Dear Shlok and Shubham,

Thanks for updating the patch!

I briefly checked the v5-0002. IIUC, your patch allows to copy generated
columns unconditionally. I think the behavior affects many people so that it is
hard to get agreement.

Can we add a new option like `GENERATED_COLUMNS [boolean]`? If the default is 
set
to off, we can keep the current specification.

Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Pgoutput not capturing the generated columns

2024-06-04 Thread Peter Smith
On Mon, Jun 3, 2024 at 9:52 PM Shlok Kyal  wrote:
>
> >
> > The attached Patch contains the suggested changes.
> >
>
> Hi,
>
> Currently, COPY command does not work for generated columns and
> therefore, COPY of generated column is not supported during tablesync
> process. So, in patch v4-0001 we added a check to allow replication of
> the generated column only if 'copy_data = false'.
>
> I am attaching patches to resolve the above issues.
>
> v5-0001: not changed
> v5-0002: Support COPY of generated column
> v5-0003: Support COPY of generated column during tablesync process
>

Hi Shlok, I have a question about patch v5-0003.

According to the patch 0001 docs "If the subscriber-side column is
also a generated column then this option has no effect; the replicated
data will be ignored and the subscriber column will be filled as
normal with the subscriber-side computed or default data".

Doesn't this mean it will be a waste of effort/resources to COPY any
column value where the subscriber-side column is generated since we
know that any copied value will be ignored anyway?

But I don't recall seeing any comment or logic for this kind of copy
optimisation in the patch 0003. Is this already accounted for
somewhere and I missed it, or is my understanding wrong?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-06-04 Thread Peter Smith
Hi,

Here are some review comments for patch v5-0003.

==
0. Whitespace warnings when the patch was applied.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:29:
trailing whitespace.
  has no effect; the replicated data will be ignored and the subscriber
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:30:
trailing whitespace.
  column will be filled as normal with the subscriber-side computed or
../patches_misc/v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch:189:
trailing whitespace.
(walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
warning: 3 lines add whitespace errors.

==
src/backend/commands/subscriptioncmds.c

1.
- res = walrcv_exec(wrconn, cmd.data, check_columnlist ? 3 : 2, tableRow);
+ column_count = (!include_generated_column && check_gen_col) ? 4 :
(check_columnlist ? 3 : 2);
+ res = walrcv_exec(wrconn, cmd.data, column_count, tableRow);

The 'column_count' seems out of control. Won't it be far simpler to
assign/increment the value dynamically only as required instead of the
tricky calculation at the end which is unnecessarily difficult to
understand?

~~~

2.
+ /*
+ * If include_generated_column option is false and all the column of
the table in the
+ * publication are generated then we should throw an error.
+ */
+ if (!isnull && !include_generated_column && check_gen_col)
+ {
+ attlist = DatumGetArrayTypeP(attlistdatum);
+ gen_col_count = DatumGetInt32(slot_getattr(slot, 4, ));
+ Assert(!isnull);
+
+ attcount = ArrayGetNItems(ARR_NDIM(attlist), ARR_DIMS(attlist));
+
+ if (attcount != 0 && attcount == gen_col_count)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use only generated column for table \"%s.%s\" in
publication when generated_column option is false",
+nspname, relname));
+ }
+

Why do you think this new logic/error is necessary?

IIUC the 'include_generated_columns' should be false to match the
existing HEAD behavior. So this scenario where your publisher-side
table *only* has generated columns is something that could already
happen, right? IOW, this introduced error could be a candidate for
another discussion/thread/patch, but is it really required for this
current patch?

==
src/backend/replication/logical/tablesync.c

3.
  lrel->remoteid,
- (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ?
-   "AND a.attgenerated = ''" : ""),
+ (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 &&
+ (walrcv_server_version(LogRepWorkerWalRcvConn) <= 16 ||
+ !MySubscription->includegeneratedcolumn) ? "AND a.attgenerated = ''" : ""),

This ternary within one big appendStringInfo seems quite complicated.
Won't it be better to split the appendStringInfo into multiple parts
so the generated-cols calculation can be done more simply?

==
src/test/subscription/t/011_generated.pl

4.
I think there should be a variety of different tablesync scenarios
(when 'include_generated_columns' is true) tested here instead of just
one, and all varieties with lots of comments to say what they are
doing, expectations etc.

a. publisher-side gen-col "a" replicating to subscriber-side NOT
gen-col "a" (ok, value gets replicated)
b. publisher-side gen-col "a" replicating to subscriber-side gen-col
(ok, but ignored)
c. publisher-side NOT gen-col "b" replicating to subscriber-side
gen-col "b" (error?)

~~

5.
+$result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab3");
+is( $result, qq(1|2
+2|4
+3|6), 'generated columns initial sync with include_generated_column = true');

Should this say "ORDER BY..." so it will not fail if the row order
happens to be something unanticipated?

==

99.
Also, see the attached file with numerous other nitpicks:
- plural param- and var-names
- typos in comments
- missing spaces
- SQL keyword should be UPPERCASE
- etc.

Please apply any/all of these if you agree with them.

==
Kind Regards,
Peter Smith.
Fujitsu Australia
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 3e78a75..afb24c3 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -103,7 +103,7 @@ typedef struct SubOpts
boolinclude_generated_column;
 } SubOpts;
 
-static List *fetch_table_list(WalReceiverConn *wrconn, List *publications, 
bool include_generated_column);
+static List *fetch_table_list(WalReceiverConn *wrconn, List *publications, 
bool include_generated_columns);
 static void check_publications_origin(WalReceiverConn *wrconn,
  List 
*publications, bool copydata,
  char 
*origin, Oid *subrel_local_oids,
@@ -2147,7 +2147,7 @@ 

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Peter Smith
Hi,

Here are some review comments for patch v5-0002.

==
GENERAL

G1.
IIUC now you are unconditionally allowing all generated columns to be copied.

I think this is assuming that the table sync code (in the next patch
0003?) is going to explicitly name all the columns it wants to copy
(so if it wants to get generated cols then it will name the generated
cols, and if is doesn't want generated cols then it won't name them).

Maybe that is OK for the logical replication tablesync case, but I am
not sure if it will be desirable to *always* copy generated columns in
other user scenarios.

e.g. I was wondering if there should be a new COPY command option
introduced here -- INCLUDE_GENERATED_COLUMNS (with default false) so
then the current HEAD behaviour is unaffected unless that option is
enabled.

~~~

G2.
The current COPY command documentation [1] says "If no column list is
specified, all columns of the table except generated columns will be
copied."

But this 0002 patch has changed that documented behaviour, and so the
documentation needs to be changed as well, right?

==
Commit Message

1.
Currently COPY command do not copy generated column. With this commit
added support for COPY for generated column.

~

The grammar/cardinality is not good here. Try some tool (Grammarly or
chatGPT, etc) to help correct it.

==
src/backend/commands/copy.c

==
src/test/regress/expected/generated.out

==
src/test/regress/sql/generated.sql

2.
I think these COPY test cases require some explicit comments to
describe what they are doing, and what are the expected results.

Currently, I have doubts about some of this test input/output

e.g.1. Why is the 'b' column sometimes specified as 1? It needs some
explanation. Are you expecting this generated col value to be
ignored/overwritten or what?

COPY gtest1 (a, b) FROM stdin DELIMITER ' ';
5 1
6 1
\.

e.g.2. what is the reason for this new 'missing data for column "b"'
error? Or is it some introduced quirk because "b" now cannot be
generated since there is no value for "a"? I don't know if the
expected *.out here is OK or not, so some test comments may help to
clarify it.

==
[1] https://www.postgresql.org/docs/devel/sql-copy.html

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Peter Smith
Hi,

Here are some review comments for patch v5-0001.

==
GENERAL G.1

The patch changes HEAD behaviour for PUBLICATION col-lists right? e.g.
maybe before they were always ignored, but now they are not?

OTOH, when 'include_generated_columns' is false then the PUBLICATION
col-list will ignore any generated cols even when they are present in
a PUBLICATION col-list, right?

These kinds of points should be noted in the commit message and in the
(col-list?) documentation.

==
Commit message

General 1a.
IMO the commit message needs some background to say something like:
"Currently generated column values are not replicated because it is
assumed that the corresponding subscriber-side table will generate its
own values for those columns."

~

General 1b.
Somewhere in this commit message, you need to give all the other
special rules --- e.g. the docs says "If the subscriber-side column is
also a generated column then this option has no effect"

~~~

2.
This commit enables support for the 'include_generated_columns' option
in logical replication, allowing the transmission of generated column
information and data alongside regular table changes. This option is
particularly useful for scenarios where applications require access to
generated column values for downstream processing or synchronization.

~

I don't think the sentence "This option is particularly useful..." is
helpful. It seems like just saying "This commit supports option XXX.
This is particularly useful if you want XXX".

~~~

3.
CREATE SUBSCRIPTION test1 connection 'dbname=postgres host=localhost port=
'publication pub1;

~

What is this CREATE SUBSCRIPTION for? Shouldn't it have an example of
the new parameter being used in it?

~~~

4.
Currently copy_data option with include_generated_columns option is
not supported. A future patch will remove this limitation.

~

Suggest to single-quote those parameter names for better readability.

~~~

5.
This commit aims to enhance the flexibility and utility of logical
replication by allowing users to include generated column information
in replication streams, paving the way for more robust data
synchronization and processing workflows.

~

IMO this paragraph can be omitted.

==
.../test_decoding/sql/decoding_into_rel.sql

6.
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '1');
+INSERT INTO gencoltable (a) VALUES (4), (5), (6);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include-generated-columns', '0');
+DROP TABLE gencoltable;
+

6a.
I felt some additional explicit comments might help the readabilty of
the output file.

e.g.1
-- When 'include-generated=columns' = '1' the generated column 'b'
values will be replicated
SELECT data FROM pg_logical_slot_get_changes...

e.g.2
-- When 'include-generated=columns' = '0' the generated column 'b'
values will not be replicated
SELECT data FROM pg_logical_slot_get_changes...

~~

6b.
Suggest adding one more test case (where 'include-generated=columns'
is not set) to confirm/demonstrate the default behaviour for
replicated generated cols.

==
doc/src/sgml/protocol.sgml

7.
+
+ include-generated-columns
+  
+   
+Boolean option to enable generated columns.
+The include-generated-columns option controls whether generated
+columns should be included in the string representation of tuples
+during logical decoding in PostgreSQL. This allows users to
+customize the output format based on whether they want to include
+these columns or not. The default is false.
+   
+  
+

7a.
It doesn't render properly. e.g. Should not be bold italic (probably
the class is wrong?), because none of the nearby parameters look this
way.

~

7b.
The name here should NOT have hyphens. It needs underscores same as
all other nearby protocol parameters.

~

7c.
The description seems overly verbose.

SUGGESTION
Boolean option to enable generated columns. This option controls
whether generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

==
doc/src/sgml/ref/create_subscription.sgml

8.
+
+   
+include_generated_column
(boolean)
+
+ 
+  Specifies whether the generated columns present in the tables
+  associated with the subscription should be replicated. The default is
+  false.
+ 

The parameter name should be plural (include_generated_columns).

==
src/backend/commands/subscriptioncmds.c

9.
 #define SUBOPT_ORIGIN 0x8000
+#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x0001


Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shlok Kyal
>
> The attached Patch contains the suggested changes.
>

Hi,

Currently, COPY command does not work for generated columns and
therefore, COPY of generated column is not supported during tablesync
process. So, in patch v4-0001 we added a check to allow replication of
the generated column only if 'copy_data = false'.

I am attaching patches to resolve the above issues.

v5-0001: not changed
v5-0002: Support COPY of generated column
v5-0003: Support COPY of generated column during tablesync process

Thought?


Thanks and Regards,
Shlok Kyal


v5-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data


v5-0002-Support-COPY-command-for-generated-columns.patch
Description: Binary data


v5-0003-Support-copy-of-generated-columns-during-tablesyn.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-06-03 Thread vignesh C
On Mon, 3 Jun 2024 at 13:03, Shubham Khanna  wrote:
>
> On Thu, May 16, 2024 at 11:35 AM Peter Smith  wrote:
> >
> > Here are some review comments for the patch v1-0001.
> >
> > ==
> > GENERAL
> >
> > G.1. Use consistent names
> >
> > It seems to add unnecessary complications by having different names
> > for all the new options, fields and API parameters.
> >
> > e.g. sometimes 'include_generated_columns'
> > e.g. sometimes 'publish_generated_columns'
> >
> > Won't it be better to just use identical names everywhere for
> > everything? I don't mind which one you choose; I just felt you only
> > need one name, not two. This comment overrides everything else in this
> > post so whatever name you choose, make adjustments for all my other
> > review comments as necessary.
>
> I have updated the name to 'include_generated_columns' everywhere in the 
> Patch.
>
> > ==
> >
> > G.2. Is it possible to just use the existing bms?
> >
> > A very large part of this patch is adding more API parameters to
> > delegate the 'publish_generated_columns' flag value down to when it is
> > finally checked and used. e.g.
> >
> > The functions:
> > - logicalrep_write_insert(), logicalrep_write_update(),
> > logicalrep_write_delete()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_tuple
> >
> > The functions:
> > - logicalrep_write_rel()
> > ... are delegating the new parameter 'publish_generated_column' down to:
> > - logicalrep_write_attrs
> >
> > AFAICT in all these places the API is already passing a "Bitmapset
> > *columns". I was wondering if it might be possible to modify the
> > "Bitmapset *columns" BEFORE any of those functions get called so that
> > the "columns" BMS either does or doesn't include generated cols (as
> > appropriate according to the option).
> >
> > Well, it might not be so simple because there are some NULL BMS
> > considerations also, but I think it would be worth investigating at
> > least, because if indeed you can find some common place (somewhere
> > like pgoutput_change()?) where the columns BMS can be filtered to
> > remove bits for generated cols then it could mean none of those other
> > patch API changes are needed at all -- then the patch would only be
> > 1/2 the size.
>
> I will analyse and reply to this in the next version.
>
> > ==
> > Commit message
> >
> > 1.
> > Now if include_generated_columns option is specified, the generated
> > column information and generated column data also will be sent.
> >
> > Usage from pgoutput plugin:
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > ~
> >
> > I think there needs to be more background information given here. This
> > commit message doesn't seem to describe anything about what is the
> > problem and how this patch fixes it. It just jumps straight into
> > giving usages of a 'include_generated_columns' option.
> >
> > It also doesn't say that this is an option that was newly *introduced*
> > by the patch -- it refers to it as though the reader should already
> > know about it.
> >
> > Furthermore, your hacker's post says "Currently it is not supported as
> > a subscription option because table sync for the generated column is
> > not possible as copy command does not support getting data for the
> > generated column. If this feature is required we can remove this
> > limitation from the copy command and then add it as a subscription
> > option later." IMO that all seems like the kind of information that
> > ought to also be mentioned in this commit message.
>
> I have updated the Commit message mentioning the suggested changes.
>
> > ==
> > contrib/test_decoding/sql/ddl.sql
> >
> > 2.
> > +-- check include_generated_columns option with generated column
> > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> > AS (a * 2) STORED);
> > +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> > NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> > +DROP TABLE gencoltable;
> > +
> >
> > 2a.
> > Perhaps you should include both option values to demonstrate the
> > difference in behaviour:
> >
> > 'include_generated_columns', '0'
> > 'include_generated_columns', '1'
>
> Added the other option values to demonstrate the difference in behaviour:
>
> > 2b.
> > I think you maybe need to include more some test combinations where
> > there is and isn't a COLUMN LIST, because I am not 100% sure I
> > understand the current logic/expectations for all combinations.
> >
> > e.g. When the generated column is in a column list but
> > 

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna
On Fri, May 24, 2024 at 8:26 AM Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for the patch v3-0001.
>
> I don't think v3 addressed any of my previous review comments for
> patches v1 and v2. [1][2]
>
> So the comments below are limited only to the new code (i.e. the v3
> versus v2 differences). Meanwhile, all my previous review comments may
> still apply.

Patch v4-0001 addresses the previous review comments for patches v1
and v2. [1][2]

> ==
> GENERAL
>
> The patch applied gives whitespace warnings:
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
> trailing whitespace.
> warning: 3 lines add whitespace errors.

Fixed.

> ==
> contrib/test_decoding/test_decoding.c
>
> 1. pg_decode_change
>
>   MemoryContext old;
> + bool include_generated_columns;
> +
>
> I'm not really convinced this variable saves any code.

Fixed.

> ==
> doc/src/sgml/protocol.sgml
>
> 2.
> +
> +  class="parameter">include-generated-columns
> + 
> +
> +The include-generated-columns option controls whether
> generated columns should be included in the string representation of
> tuples during logical decoding in PostgreSQL. This allows users to
> customize the output format based on whether they want to include
> these columns or not.
> + 
> + 
> + 
>
> 2a.
> Something is not correct when this name has hyphens and all the nearby
> parameter names do not. Shouldn't it be all uppercase like the other
> boolean parameter?
>
> ~
>
> 2b.
> Text in the SGML file should be wrapped properly.
>
> ~
>
> 2c.
> IMO the comment can be more terse and it also needs to specify that it
> is a boolean type, and what is the default value if not passed.
>
> SUGGESTION
>
> INCLUDE_GENERATED_COLUMNS [ boolean ]
>
> If true, then generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ==
> src/backend/replication/logical/proto.c
>
> 3. logicalrep_write_tuple
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
>
> 3a.
> This code seems overcomplicated checking the same flag multiple times.
>
> SUGGESTION
> if (att->attgenerated)
> {
>   if (!publish_generated_column)
> continue;
> }
> else
> {
>   if (!column_in_column_list(att->attnum, columns))
> continue;
> }
>
> ~
>
> 3b.
> The same logic occurs several times in logicalrep_write_tuple

Fixed.

> 4. logicalrep_write_attrs
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> Shouldn't these code fragments (2x in this function) look the same as
> in logicalrep_write_tuple? See the above review comments.

Fixed.

> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 5. maybe_send_schema
>
>   TransactionId topxid = InvalidTransactionId;
> + bool publish_generated_column = data->publish_generated_column;
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->streaming).

Fixed.

> 6. pgoutput_change
> -
> + bool publish_generated_column = data->publish_generated_column;
> +
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->binary).

Fixed.

> ==
> [1] My v1 review -
> https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6pe5uyjv_uxvrhu3ck7w2ahb1uqy...@mail.gmail.com
> [2] My v2 review -
> https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna
On Thu, May 23, 2024 at 5:56 PM vignesh C  wrote:
>
> On Thu, 23 May 2024 at 09:19, Shubham Khanna
>  wrote:
> >
> > > Dear Shubham,
> > >
> > > Thanks for creating a patch! Here are high-level comments.
> >
> > > 1.
> > > Please document the feature. If it is hard to describe, we should change 
> > > the API.
> >
> > I have added the feature in the document.
> >
> > > 4.
> > > Regarding the test_decoding plugin, it has already been able to decode the
> > > generated columns. So... as the first place, is the proposed option 
> > > really needed
> > > for the plugin? Why do you include it?
> > > If you anyway want to add the option, the default value should be on - 
> > > which keeps
> > > current behavior.
> >
> > I have made the generated column options as true for test_decoding
> > plugin so by default we will send generated column data.
> >
> > > 5.
> > > Assuming that the feature become usable used for logical replicaiton. Not 
> > > sure,
> > > should we change the protocol version at that time? Nodes prior than PG17 
> > > may
> > > not want receive values for generated columns. Can we control only by the 
> > > option?
> >
> > I verified the backward compatibility test by using the generated
> > column option and it worked fine. I think there is no need to make any
> > further changes.
> >
> > > 7.
> > >
> > > Some functions refer data->publish_generated_column many times. Can we 
> > > store
> > > the value to a variable?
> > >
> > > Below comments are for test_decoding part, but they may be not needed.
> > >
> > > =
> > >
> > > a. pg_decode_startup()
> > >
> > > ```
> > > +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > > ```
> > >
> > > Other options for test_decoding do not have underscore. It should be
> > > "include-generated-columns".
> > >
> > > b. pg_decode_change()
> > >
> > > data->include_generated_columns is referred four times in the function.
> > > Can you store the value to a varibable?
> > >
> > >
> > > c. pg_decode_change()
> > >
> > > ```
> > > -true);
> > > +true, 
> > > data->include_generated_columns );
> > > ```
> > >
> > > Please remove the blank.
> >
> > Fixed.
> > The attached v3 Patch has the changes for the same.
>
> Few comments:
> 1) Since this is removed, tupdesc variable is not required anymore:
> +++ b/src/backend/catalog/pg_publication.c
> @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
> List *columns,
> errmsg("cannot use system
> column \"%s\" in publication column list",
>colname));
>
> -   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -   ereport(ERROR,
> -
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -   errmsg("cannot use generated
> column \"%s\" in publication column list",
> -  colname));

Fixed.

> 2) In test_decoding include_generated_columns option is used:
> +   else if (strcmp(elem->defname,
> "include_generated_columns") == 0)
> +   {
> +   if (elem->arg == NULL)
> +   continue;
> +   else if (!parse_bool(strVal(elem->arg),
> >include_generated_columns))
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
> +   }
>
> In subscription we have used generated_column, we can try to use the
> same option in both places:
> +   else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
> +strcmp(defel->defname,
> "generated_column") == 0)
> +   {
> +   if (IsSet(opts->specified_opts,
> SUBOPT_GENERATED_COLUMN))
> +   errorConflictingDefElem(defel, pstate);
> +
> +   opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
> +   opts->generated_column = defGetBoolean(defel);
> +   }

Will update the name to 'include_generated_columns' in the next
version of the Patch.

> 3) Tab completion can be added for create subscription to include
> generated_column option

Fixed.

> 4) There are few whitespace issues while applying the patch, check for
> git diff --check

Fixed.

> 5) Add few tests for the new option added

Added new test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna
On Thu, May 23, 2024 at 10:56 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I checked your patches briefly. Here are my 
> comments.
>
> 01. API
>
> Since the option for test_decoding is enabled by default, I think it should 
> be renamed.
> E.g., "skip-generated-columns" or something.

Let's keep the same name 'include_generated_columns' for both the cases.

> 02. ddl.sql
>
> ```
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
> 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
> 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', 
> '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> ```
>
> We should test non-default case, which the generated columns are not 
> generated.

Added the non-default case, which the generated columns are not generated.

> 03. ddl.sql
>
> Not sure new tests are in the correct place. Do we have to add new file and 
> move tests to it?
> Thought?

Added the new tests in the 'decoding_into_rel.out' file.

> 04. protocol.sgml
>
> Please keep the format of the sgml file.

Fixed.

> 05. protocol.sgml
>
> The option is implemented as the streaming option of pgoutput plugin, so they 
> should be
> located under "Logical Streaming Replication Parameters" section.

Fixed.

> 05. AlterSubscription
>
> ```
> +   if (IsSet(opts.specified_opts, 
> SUBOPT_GENERATED_COLUMN))
> +   {
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("toggling 
> generated_column option is not allowed.")));
> +   }
> ```
>
> If you don't want to support the option, you can remove 
> SUBOPT_GENERATED_COLUMN
> macro from the function. But can you clarify the reason why you do not want?

Fixed.

> 07. logicalrep_write_tuple
>
> ```
> -   if (!column_in_column_list(att->attnum, columns))
> +   if (!column_in_column_list(att->attnum, columns) && 
> !att->attgenerated)
> +   continue;
> +
> +   if (att->attgenerated && !publish_generated_column)
> continue;
> ```
>
> I think changes in v2 was reverted or wrongly merged.

Fixed.

> 08. test code
>
> Can you add tests that generated columns are replicated by the logical 
> replication?

Added the test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.




Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna
On Tue, May 21, 2024 at 12:23 PM Peter Smith  wrote:
>
> Hi,
>
> AFAICT this v2-0001 patch differences from v1 is mostly about adding
> the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
> addressing any of my previous review comments for patch v1. [1]. So
> these comments below are limited only to the new option code; All my
> previous review comments probably still apply.
>
> ==
> Commit message
>
> 1. (General)
> The commit message is seriously lacking background explanation to describe:
> - What is the current behaviour w.r.t. generated columns
> - What is the problem with the current behaviour?
> - What exactly is this patch doing to address that problem?

Added the information related to this inside the Patch.

> 2.
> New option generated_option is added in create subscription. Now if this
> option is specified as 'true' during create subscription, generated
> columns in the tables, present in publisher (to which this subscription is
> subscribed) can also be replicated.
>
> -
>
> 2A.
> "generated_option" is not the name of the new option.
>
> ~
>
> 2B.
> "create subscription" stmt should be UPPERCASE; will also be more
> readable if the option name is quoted.
>
> ~
>
> 2C.
> Needs more information like under what condition is this option ignored etc.

Fixed.

> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> 3.
> +id="sql-createsubscription-params-with-generated-column">
> +generated-column 
> (boolean)
> +
> + 
> +  Specifies whether the generated columns present in the tables
> +  associated with the subscription should be replicated. The default 
> is
> +  false.
> + 
> +
> + 
> +  This parameter can only be set true if copy_data is set to false.
> +  This option works fine when a generated column (in
> publisher) is replicated to a
> +  non-generated column (in subscriber). Else if it is
> replicated to a generated
> +  column, it will ignore the replicated data and fill the
> column with computed or
> +  default data.
> + 
> +
> +   
>
> 3A.
> There is a typo in the name "generated-column" because we should use
> underscores (not hyphens) for the option names.
>
> ~
>
> 3B.
> This it is not a good option name because there is no verb so it
> doesn't mean anything to set it true/false -- actually there IS a verb
> "generate" but we are not saying generate = true/false, so this name
> is also quite confusing.
>
> I think "include_generated_columns" would be much better, but if
> others think that name is too long then maybe "include_generated_cols"
> or "include_gen_cols" or similar. Of course, whatever if the final
> decision should be propagated same thru all the code comments, params,
> fields, etc.
>
> ~
>
> 3C.
> copy_data and false should be marked up as  fonts in the sgml
>
> ~
>
> 3D.
>
> Suggest re-word this part. Don't need to explain when it "works fine".
>
> BEFORE
> This option works fine when a generated column (in publisher) is
> replicated to a non-generated column (in subscriber). Else if it is
> replicated to a generated column, it will ignore the replicated data
> and fill the column with computed or default data.
>
> SUGGESTION
> If the subscriber-side column is also a generated column then this
> option has no effect; the replicated data will be ignored and the
> subscriber column will be filled as normal with the subscriber-side
> computed or default data.

Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
>
> 4. AlterSubscription
> SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
> SUBOPT_PASSWORD_REQUIRED |
> SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
> -   SUBOPT_ORIGIN);
> +   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);
>
> Hmm. Is this correct? If ALTER is not allowed (later in this patch
> there is a message "toggling generated_column option is not allowed."
> then why are we even saying that SUBOPT_GENERATED_COLUMN is a
> support_opt for ALTER?

Fixed.

> 5.
> + if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("toggling generated_column option is not allowed.")));
> + }
>
> 5A.
> I suspect this is not even needed if the 'supported_opt' is fixed per
> the previous comment.
>
> ~
>
> 5B.
> But if this message is still needed then I think it should say "ALTER
> is not allowed" (not "toggling is not allowed") and also the option
> name should be quoted as per the new guidelines for error messages.
>
> ==
> src/backend/replication/logical/proto.c

Fixed.

> 6. logicalrep_write_tuple
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> +
>
> Calling column_in_column_list() might be a more expensive operation
> than checking just 

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna
On Thu, May 16, 2024 at 11:35 AM Peter Smith  wrote:
>
> Here are some review comments for the patch v1-0001.
>
> ==
> GENERAL
>
> G.1. Use consistent names
>
> It seems to add unnecessary complications by having different names
> for all the new options, fields and API parameters.
>
> e.g. sometimes 'include_generated_columns'
> e.g. sometimes 'publish_generated_columns'
>
> Won't it be better to just use identical names everywhere for
> everything? I don't mind which one you choose; I just felt you only
> need one name, not two. This comment overrides everything else in this
> post so whatever name you choose, make adjustments for all my other
> review comments as necessary.

I have updated the name to 'include_generated_columns' everywhere in the Patch.

> ==
>
> G.2. Is it possible to just use the existing bms?
>
> A very large part of this patch is adding more API parameters to
> delegate the 'publish_generated_columns' flag value down to when it is
> finally checked and used. e.g.
>
> The functions:
> - logicalrep_write_insert(), logicalrep_write_update(),
> logicalrep_write_delete()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_tuple
>
> The functions:
> - logicalrep_write_rel()
> ... are delegating the new parameter 'publish_generated_column' down to:
> - logicalrep_write_attrs
>
> AFAICT in all these places the API is already passing a "Bitmapset
> *columns". I was wondering if it might be possible to modify the
> "Bitmapset *columns" BEFORE any of those functions get called so that
> the "columns" BMS either does or doesn't include generated cols (as
> appropriate according to the option).
>
> Well, it might not be so simple because there are some NULL BMS
> considerations also, but I think it would be worth investigating at
> least, because if indeed you can find some common place (somewhere
> like pgoutput_change()?) where the columns BMS can be filtered to
> remove bits for generated cols then it could mean none of those other
> patch API changes are needed at all -- then the patch would only be
> 1/2 the size.

I will analyse and reply to this in the next version.

> ==
> Commit message
>
> 1.
> Now if include_generated_columns option is specified, the generated
> column information and generated column data also will be sent.
>
> Usage from pgoutput plugin:
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> ~
>
> I think there needs to be more background information given here. This
> commit message doesn't seem to describe anything about what is the
> problem and how this patch fixes it. It just jumps straight into
> giving usages of a 'include_generated_columns' option.
>
> It also doesn't say that this is an option that was newly *introduced*
> by the patch -- it refers to it as though the reader should already
> know about it.
>
> Furthermore, your hacker's post says "Currently it is not supported as
> a subscription option because table sync for the generated column is
> not possible as copy command does not support getting data for the
> generated column. If this feature is required we can remove this
> limitation from the copy command and then add it as a subscription
> option later." IMO that all seems like the kind of information that
> ought to also be mentioned in this commit message.

I have updated the Commit message mentioning the suggested changes.

> ==
> contrib/test_decoding/sql/ddl.sql
>
> 2.
> +-- check include_generated_columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
> +DROP TABLE gencoltable;
> +
>
> 2a.
> Perhaps you should include both option values to demonstrate the
> difference in behaviour:
>
> 'include_generated_columns', '0'
> 'include_generated_columns', '1'

Added the other option values to demonstrate the difference in behaviour:

> 2b.
> I think you maybe need to include more some test combinations where
> there is and isn't a COLUMN LIST, because I am not 100% sure I
> understand the current logic/expectations for all combinations.
>
> e.g. When the generated column is in a column list but
> 'publish_generated_columns' is false then what should happen? etc.
> Also if there are any special rules then those should be mentioned in
> the commit message.

Test case is added and the same is mentioned in the documentation.

> ==
> src/backend/replication/logical/proto.c
>
> 3.
> For all the API 

Re: Pgoutput not capturing the generated columns

2024-05-23 Thread Peter Smith
Hi,

Here are some review comments for the patch v3-0001.

I don't think v3 addressed any of my previous review comments for
patches v1 and v2. [1][2]

So the comments below are limited only to the new code (i.e. the v3
versus v2 differences). Meanwhile, all my previous review comments may
still apply.

==
GENERAL

The patch applied gives whitespace warnings:

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
trailing whitespace.

../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
trailing whitespace.
warning: 3 lines add whitespace errors.

==
contrib/test_decoding/test_decoding.c

1. pg_decode_change

  MemoryContext old;
+ bool include_generated_columns;
+

I'm not really convinced this variable saves any code.

==
doc/src/sgml/protocol.sgml

2.
+
+ include-generated-columns
+ 
+
+The include-generated-columns option controls whether
generated columns should be included in the string representation of
tuples during logical decoding in PostgreSQL. This allows users to
customize the output format based on whether they want to include
these columns or not.
+ 
+ 
+ 

2a.
Something is not correct when this name has hyphens and all the nearby
parameter names do not. Shouldn't it be all uppercase like the other
boolean parameter?

~

2b.
Text in the SGML file should be wrapped properly.

~

2c.
IMO the comment can be more terse and it also needs to specify that it
is a boolean type, and what is the default value if not passed.

SUGGESTION

INCLUDE_GENERATED_COLUMNS [ boolean ]

If true, then generated columns should be included in the string
representation of tuples during logical decoding in PostgreSQL. The
default is false.

==
src/backend/replication/logical/proto.c

3. logicalrep_write_tuple

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;

3a.
This code seems overcomplicated checking the same flag multiple times.

SUGGESTION
if (att->attgenerated)
{
  if (!publish_generated_column)
continue;
}
else
{
  if (!column_in_column_list(att->attnum, columns))
continue;
}

~

3b.
The same logic occurs several times in logicalrep_write_tuple

~~~

4. logicalrep_write_attrs

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;
+

Shouldn't these code fragments (2x in this function) look the same as
in logicalrep_write_tuple? See the above review comments.

==
src/backend/replication/pgoutput/pgoutput.c

5. maybe_send_schema

  TransactionId topxid = InvalidTransactionId;
+ bool publish_generated_column = data->publish_generated_column;

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->streaming).

~~~

6. pgoutput_change
-
+ bool publish_generated_column = data->publish_generated_column;
+

I'm not convinced this saves any code, and anyway, it is not
consistent with other fields in this function that are not extracted
to another variable (e.g. data->binary).

==
[1] My v1 review -
https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6pe5uyjv_uxvrhu3ck7w2ahb1uqy...@mail.gmail.com
[2] My v2 review -
https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Pgoutput not capturing the generated columns

2024-05-23 Thread vignesh C
On Thu, 23 May 2024 at 09:19, Shubham Khanna
 wrote:
>
> > Dear Shubham,
> >
> > Thanks for creating a patch! Here are high-level comments.
>
> > 1.
> > Please document the feature. If it is hard to describe, we should change 
> > the API.
>
> I have added the feature in the document.
>
> > 4.
> > Regarding the test_decoding plugin, it has already been able to decode the
> > generated columns. So... as the first place, is the proposed option really 
> > needed
> > for the plugin? Why do you include it?
> > If you anyway want to add the option, the default value should be on - 
> > which keeps
> > current behavior.
>
> I have made the generated column options as true for test_decoding
> plugin so by default we will send generated column data.
>
> > 5.
> > Assuming that the feature become usable used for logical replicaiton. Not 
> > sure,
> > should we change the protocol version at that time? Nodes prior than PG17 
> > may
> > not want receive values for generated columns. Can we control only by the 
> > option?
>
> I verified the backward compatibility test by using the generated
> column option and it worked fine. I think there is no need to make any
> further changes.
>
> > 7.
> >
> > Some functions refer data->publish_generated_column many times. Can we store
> > the value to a variable?
> >
> > Below comments are for test_decoding part, but they may be not needed.
> >
> > =
> >
> > a. pg_decode_startup()
> >
> > ```
> > +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > ```
> >
> > Other options for test_decoding do not have underscore. It should be
> > "include-generated-columns".
> >
> > b. pg_decode_change()
> >
> > data->include_generated_columns is referred four times in the function.
> > Can you store the value to a varibable?
> >
> >
> > c. pg_decode_change()
> >
> > ```
> > -true);
> > +true, data->include_generated_columns 
> > );
> > ```
> >
> > Please remove the blank.
>
> Fixed.
> The attached v3 Patch has the changes for the same.

Few comments:
1) Since this is removed, tupdesc variable is not required anymore:
+++ b/src/backend/catalog/pg_publication.c
@@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
List *columns,
errmsg("cannot use system
column \"%s\" in publication column list",
   colname));

-   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
-   ereport(ERROR,
-
errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-   errmsg("cannot use generated
column \"%s\" in publication column list",
-  colname));

2) In test_decoding include_generated_columns option is used:
+   else if (strcmp(elem->defname,
"include_generated_columns") == 0)
+   {
+   if (elem->arg == NULL)
+   continue;
+   else if (!parse_bool(strVal(elem->arg),
>include_generated_columns))
+   ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("could not
parse value \"%s\" for parameter \"%s\"",
+
strVal(elem->arg), elem->defname)));
+   }

In subscription we have used generated_column, we can try to use the
same option in both places:
+   else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
+strcmp(defel->defname,
"generated_column") == 0)
+   {
+   if (IsSet(opts->specified_opts,
SUBOPT_GENERATED_COLUMN))
+   errorConflictingDefElem(defel, pstate);
+
+   opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
+   opts->generated_column = defGetBoolean(defel);
+   }

3) Tab completion can be added for create subscription to include
generated_column option

4) There are few whitespace issues while applying the patch, check for
git diff --check

5) Add few tests for the new option added

Regards,
Vignesh




RE: Pgoutput not capturing the generated columns

2024-05-22 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for updating the patch! I checked your patches briefly. Here are my 
comments.

01. API

Since the option for test_decoding is enabled by default, I think it should be 
renamed.
E.g., "skip-generated-columns" or something.

02. ddl.sql

```
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) 
STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1');
+data 
+-
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
```

We should test non-default case, which the generated columns are not generated.

03. ddl.sql

Not sure new tests are in the correct place. Do we have to add new file and 
move tests to it?
Thought?

04. protocol.sgml

Please keep the format of the sgml file.

05. protocol.sgml

The option is implemented as the streaming option of pgoutput plugin, so they 
should be
located under "Logical Streaming Replication Parameters" section.

05. AlterSubscription

```
+   if (IsSet(opts.specified_opts, 
SUBOPT_GENERATED_COLUMN))
+   {
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("toggling 
generated_column option is not allowed.")));
+   }
```

If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
macro from the function. But can you clarify the reason why you do not want?

07. logicalrep_write_tuple

```
-   if (!column_in_column_list(att->attnum, columns))
+   if (!column_in_column_list(att->attnum, columns) && 
!att->attgenerated)
+   continue;
+
+   if (att->attgenerated && !publish_generated_column)
continue;
```

I think changes in v2 was reverted or wrongly merged.

08. test code

Can you add tests that generated columns are replicated by the logical 
replication?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Shubham Khanna
> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the 
> API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really 
> needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which 
> keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not 
> sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the 
> option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =
>
> a. pg_decode_startup()
>
> ```
> +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -true);
> +true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.


v3-0001-Support-generated-column-capturing-generated-colu.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Peter Eisentraut

On 08.05.24 09:13, Shubham Khanna wrote:

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.


It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.


I think having an option for including stored generated columns is in 
general ok.



[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org





Re: Pgoutput not capturing the generated columns

2024-05-21 Thread Peter Smith
Hi,

AFAICT this v2-0001 patch differences from v1 is mostly about adding
the new CREATE SUBSCRIPTION option. Specifically, I don't think it is
addressing any of my previous review comments for patch v1. [1]. So
these comments below are limited only to the new option code; All my
previous review comments probably still apply.

==
Commit message

1. (General)
The commit message is seriously lacking background explanation to describe:
- What is the current behaviour w.r.t. generated columns
- What is the problem with the current behaviour?
- What exactly is this patch doing to address that problem?

~

2.
New option generated_option is added in create subscription. Now if this
option is specified as 'true' during create subscription, generated
columns in the tables, present in publisher (to which this subscription is
subscribed) can also be replicated.

-

2A.
"generated_option" is not the name of the new option.

~

2B.
"create subscription" stmt should be UPPERCASE; will also be more
readable if the option name is quoted.

~

2C.
Needs more information like under what condition is this option ignored etc.

==
doc/src/sgml/ref/create_subscription.sgml

3.
+   
+generated-column (boolean)
+
+ 
+  Specifies whether the generated columns present in the tables
+  associated with the subscription should be replicated. The default is
+  false.
+ 
+
+ 
+  This parameter can only be set true if copy_data is set to false.
+  This option works fine when a generated column (in
publisher) is replicated to a
+  non-generated column (in subscriber). Else if it is
replicated to a generated
+  column, it will ignore the replicated data and fill the
column with computed or
+  default data.
+ 
+
+   

3A.
There is a typo in the name "generated-column" because we should use
underscores (not hyphens) for the option names.

~

3B.
This it is not a good option name because there is no verb so it
doesn't mean anything to set it true/false -- actually there IS a verb
"generate" but we are not saying generate = true/false, so this name
is also quite confusing.

I think "include_generated_columns" would be much better, but if
others think that name is too long then maybe "include_generated_cols"
or "include_gen_cols" or similar. Of course, whatever if the final
decision should be propagated same thru all the code comments, params,
fields, etc.

~

3C.
copy_data and false should be marked up as  fonts in the sgml

~

3D.

Suggest re-word this part. Don't need to explain when it "works fine".

BEFORE
This option works fine when a generated column (in publisher) is
replicated to a non-generated column (in subscriber). Else if it is
replicated to a generated column, it will ignore the replicated data
and fill the column with computed or default data.

SUGGESTION
If the subscriber-side column is also a generated column then this
option has no effect; the replicated data will be ignored and the
subscriber column will be filled as normal with the subscriber-side
computed or default data.

==
src/backend/commands/subscriptioncmds.c

4. AlterSubscription
SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
SUBOPT_PASSWORD_REQUIRED |
SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-   SUBOPT_ORIGIN);
+   SUBOPT_ORIGIN | SUBOPT_GENERATED_COLUMN);

Hmm. Is this correct? If ALTER is not allowed (later in this patch
there is a message "toggling generated_column option is not allowed."
then why are we even saying that SUBOPT_GENERATED_COLUMN is a
support_opt for ALTER?

~~~

5.
+ if (IsSet(opts.specified_opts, SUBOPT_GENERATED_COLUMN))
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("toggling generated_column option is not allowed.")));
+ }

5A.
I suspect this is not even needed if the 'supported_opt' is fixed per
the previous comment.

~

5B.
But if this message is still needed then I think it should say "ALTER
is not allowed" (not "toggling is not allowed") and also the option
name should be quoted as per the new guidelines for error messages.

==
src/backend/replication/logical/proto.c


6. logicalrep_write_tuple

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+

Calling column_in_column_list() might be a more expensive operation
than checking just generated columns flag so maybe reverse the order
and check the generated columns first for a tiny performance gain.

~~

7.
- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if (!column_in_column_list(att->attnum, columns))
  continue;

+ if (att->attgenerated && !publish_generated_column)
+ continue;

ditto #6

~~

8. logicalrep_write_attrs

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

  if 

Re: Pgoutput not capturing the generated columns

2024-05-20 Thread vignesh C
On Mon, 20 May 2024 at 13:49, Masahiko Sawada  wrote:
>
> Hi,
>
> On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
>  wrote:
> >
> > On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
> >  wrote:
> > >
> > > Hi PG Hackers.
> > >
> > > We are interested in enhancing the functionality of the pgoutput plugin 
> > > by adding support for generated columns.
> > > Could you please guide us on the necessary steps to achieve this? 
> > > Additionally, do you have a platform for tracking such feature requests? 
> > > Any insights or assistance you can provide on this matter would be 
> > > greatly appreciated.
> >
> > The attached patch has the changes to support capturing generated
> > column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> > ‘include_generated_columns’ option is specified, the generated column
> > information and generated column data also will be sent.
>
> As Euler mentioned earlier, I think it's a decision not to replicate
> generated columns because we don't know the target table on the
> subscriber has the same expression and there could be locale issues
> even if it looks the same. I can see that a benefit of this proposal
> would be to save cost to compute generated column values if the user
> wants the target table on the subscriber to have exactly the same data
> as the publisher's one. Are there other benefits or use cases?

I think this will be useful mainly for the use cases where the
publisher has generated columns and the subscriber does not have
generated  columns.
In the case where both the publisher and subscriber have generated
columns, the current patch will overwrite the generated column values
based on the expression for the generated column in the subscriber.

> >
> > Usage from pgoutput plugin:
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > CREATE publication pub1 for all tables;
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> > 'proto_version', '1', 'publication_names', 'pub1',
> > 'include_generated_columns', 'true');
> >
> > Usage from test_decoding plugin:
> > SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 
> > 'test_decoding');
> > CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> > (a * 2) STORED);
> > INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> > 'include-xids', '0', 'skip-empty-xacts', '1',
> > 'include_generated_columns', '1');
> >
> > Currently it is not supported as a subscription option because table
> > sync for the generated column is not possible as copy command does not
> > support getting data for the generated column. If this feature is
> > required we can remove this limitation from the copy command and then
> > add it as a subscription option later.
> > Thoughts?
>
> I think that if we want to support an option to replicate generated
> columns, the initial tablesync should support it too. Otherwise, we
> end up filling the target columns data with NULL during the initial
> tablesync but with replicated data during the streaming changes.

+1 for supporting initial sync.
Currently copy_data = true and generate_column = true are not
supported, this limitation will be removed in one of the upcoming
patches.

Regards,
Vignesh




Re: Pgoutput not capturing the generated columns

2024-05-20 Thread Masahiko Sawada
Hi,

On Wed, May 8, 2024 at 4:14 PM Shubham Khanna
 wrote:
>
> On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
>  wrote:
> >
> > Hi PG Hackers.
> >
> > We are interested in enhancing the functionality of the pgoutput plugin by 
> > adding support for generated columns.
> > Could you please guide us on the necessary steps to achieve this? 
> > Additionally, do you have a platform for tracking such feature requests? 
> > Any insights or assistance you can provide on this matter would be greatly 
> > appreciated.
>
> The attached patch has the changes to support capturing generated
> column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
> ‘include_generated_columns’ option is specified, the generated column
> information and generated column data also will be sent.

As Euler mentioned earlier, I think it's a decision not to replicate
generated columns because we don't know the target table on the
subscriber has the same expression and there could be locale issues
even if it looks the same. I can see that a benefit of this proposal
would be to save cost to compute generated column values if the user
wants the target table on the subscriber to have exactly the same data
as the publisher's one. Are there other benefits or use cases?

>
> Usage from pgoutput plugin:
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> CREATE publication pub1 for all tables;
> SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
> SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
> 'proto_version', '1', 'publication_names', 'pub1',
> 'include_generated_columns', 'true');
>
> Usage from test_decoding plugin:
> SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 
> 'test_decoding');
> CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
> (a * 2) STORED);
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include_generated_columns', '1');
>
> Currently it is not supported as a subscription option because table
> sync for the generated column is not possible as copy command does not
> support getting data for the generated column. If this feature is
> required we can remove this limitation from the copy command and then
> add it as a subscription option later.
> Thoughts?

I think that if we want to support an option to replicate generated
columns, the initial tablesync should support it too. Otherwise, we
end up filling the target columns data with NULL during the initial
tablesync but with replicated data during the streaming changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Pgoutput not capturing the generated columns

2024-05-20 Thread Shlok Kyal
Hi Kuroda-san,

Thanks for reviewing the patch. I have fixed some of the comments
> 2.
> Currently, the option is implemented as streaming option. Are there any 
> reasons
> to choose the way? Another approach is to implement as slot option, like 
> failover
> and temporary.
I think the current approach is appropriate. The options such as
failover and temporary seem like properties of a slot and I think
decoding of generated column should not be slot specific. Also adding
a new option for slot may create an overhead.

> 3.
> You said that subscription option is not supported for now. Not sure, is it 
> mean
> that logical replication feature cannot be used for generated columns? If so,
> the restriction won't be acceptable. If the combination between this and 
> initial
> sync is problematic, can't we exclude them in CreateSubscrition and 
> AlterSubscription?
> E.g., create_slot option cannot be set if slot_name is NONE.
Added an option 'generated_column' for create subscription. Currently
it allow to set 'generated_column' option as true only if 'copy_data'
is set to false.
Also we don't allow user to alter the 'generated_column' option.

> 6. logicalrep_write_tuple()
>
> ```
> -if (!column_in_column_list(att->attnum, columns))
> +if (!column_in_column_list(att->attnum, columns) && 
> !att->attgenerated)
> +continue;
> ```
>
> Hmm, does above mean that generated columns are decoded even if they are not 
> in
> the column list? If so, why? I think such columns should not be sent.
Fixed

Thanks and Regards,
Shlok Kyal


v2-0001-Support-generated-column-capturing-generated-colu.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2024-05-16 Thread Peter Smith
Here are some review comments for the patch v1-0001.

==
GENERAL

G.1. Use consistent names

It seems to add unnecessary complications by having different names
for all the new options, fields and API parameters.

e.g. sometimes 'include_generated_columns'
e.g. sometimes 'publish_generated_columns'

Won't it be better to just use identical names everywhere for
everything? I don't mind which one you choose; I just felt you only
need one name, not two. This comment overrides everything else in this
post so whatever name you choose, make adjustments for all my other
review comments as necessary.

==

G.2. Is it possible to just use the existing bms?

A very large part of this patch is adding more API parameters to
delegate the 'publish_generated_columns' flag value down to when it is
finally checked and used. e.g.

The functions:
- logicalrep_write_insert(), logicalrep_write_update(),
logicalrep_write_delete()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_tuple

The functions:
- logicalrep_write_rel()
... are delegating the new parameter 'publish_generated_column' down to:
- logicalrep_write_attrs

AFAICT in all these places the API is already passing a "Bitmapset
*columns". I was wondering if it might be possible to modify the
"Bitmapset *columns" BEFORE any of those functions get called so that
the "columns" BMS either does or doesn't include generated cols (as
appropriate according to the option).

Well, it might not be so simple because there are some NULL BMS
considerations also, but I think it would be worth investigating at
least, because if indeed you can find some common place (somewhere
like pgoutput_change()?) where the columns BMS can be filtered to
remove bits for generated cols then it could mean none of those other
patch API changes are needed at all -- then the patch would only be
1/2 the size.

==
Commit message

1.
Now if include_generated_columns option is specified, the generated
column information and generated column data also will be sent.

Usage from pgoutput plugin:
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

~

I think there needs to be more background information given here. This
commit message doesn't seem to describe anything about what is the
problem and how this patch fixes it. It just jumps straight into
giving usages of a 'include_generated_columns' option.

It also doesn't say that this is an option that was newly *introduced*
by the patch -- it refers to it as though the reader should already
know about it.

Furthermore, your hacker's post says "Currently it is not supported as
a subscription option because table sync for the generated column is
not possible as copy command does not support getting data for the
generated column. If this feature is required we can remove this
limitation from the copy command and then add it as a subscription
option later." IMO that all seems like the kind of information that
ought to also be mentioned in this commit message.

==
contrib/test_decoding/sql/ddl.sql

2.
+-- check include_generated_columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (a * 2) STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');
+DROP TABLE gencoltable;
+

2a.
Perhaps you should include both option values to demonstrate the
difference in behaviour:

'include_generated_columns', '0'
'include_generated_columns', '1'

~

2b.
I think you maybe need to include more some test combinations where
there is and isn't a COLUMN LIST, because I am not 100% sure I
understand the current logic/expectations for all combinations.

e.g. When the generated column is in a column list but
'publish_generated_columns' is false then what should happen? etc.
Also if there are any special rules then those should be mentioned in
the commit message.

==
src/backend/replication/logical/proto.c

3.
For all the API changes the new parameter name should be plural.

/publish_generated_column/publish_generated_columns/

~~~

4. logical_rep_write_tuple:

- if (att->attisdropped || att->attgenerated)
+ if (att->attisdropped)
  continue;

- if (!column_in_column_list(att->attnum, columns))
+ if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+ continue;
+
+ if (att->attgenerated && !publish_generated_column)
  continue;
That code seems confusing. Shouldn't the logic be exactly as also in
logicalrep_write_attrs()?

e.g. Shouldn't they both look like this:

if (att->attisdropped)
  continue;

if 

RE: Pgoutput not capturing the generated columns

2024-05-08 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for creating a patch! Here are high-level comments.

1.
Please document the feature. If it is hard to describe, we should change the 
API.

2.
Currently, the option is implemented as streaming option. Are there any reasons
to choose the way? Another approach is to implement as slot option, like 
failover
and temporary.

3.
You said that subscription option is not supported for now. Not sure, is it mean
that logical replication feature cannot be used for generated columns? If so,
the restriction won't be acceptable. If the combination between this and initial
sync is problematic, can't we exclude them in CreateSubscrition and 
AlterSubscription?
E.g., create_slot option cannot be set if slot_name is NONE.

4.
Regarding the test_decoding plugin, it has already been able to decode the
generated columns. So... as the first place, is the proposed option really 
needed
for the plugin? Why do you include it?
If you anyway want to add the option, the default value should be on - which 
keeps
current behavior.

5.
Assuming that the feature become usable used for logical replicaiton. Not sure,
should we change the protocol version at that time? Nodes prior than PG17 may
not want receive values for generated columns. Can we control only by the 
option?

6. logicalrep_write_tuple()

```
-if (!column_in_column_list(att->attnum, columns))
+if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
+continue;
```

Hmm, does above mean that generated columns are decoded even if they are not in
the column list? If so, why? I think such columns should not be sent.

7.

Some functions refer data->publish_generated_column many times. Can we store
the value to a variable?

Below comments are for test_decoding part, but they may be not needed.

=

a. pg_decode_startup()

```
+else if (strcmp(elem->defname, "include_generated_columns") == 0)
```

Other options for test_decoding do not have underscore. It should be
"include-generated-columns".

b. pg_decode_change()

data->include_generated_columns is referred four times in the function.
Can you store the value to a varibable?


c. pg_decode_change()

```
-true);
+true, data->include_generated_columns );
```

Please remove the blank.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Pgoutput not capturing the generated columns

2024-05-08 Thread Shubham Khanna
On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
 wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by 
> adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? 
> Additionally, do you have a platform for tracking such feature requests? Any 
> insights or assistance you can provide on this matter would be greatly 
> appreciated.

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.


v1-0001-Support-capturing-generated-column-data-using-pgo.patch
Description: Binary data


Re: Pgoutput not capturing the generated columns

2023-09-11 Thread Rajendra Kumar Dangwal
Hi PG Hackers.

We are interested in enhancing the functionality of the pgoutput plugin by 
adding support for generated columns. 
Could you please guide us on the necessary steps to achieve this? Additionally, 
do you have a platform for tracking such feature requests? Any insights or 
assistance you can provide on this matter would be greatly appreciated.

Many thanks.
Rajendra.





Re: Pgoutput not capturing the generated columns

2023-08-21 Thread Rajendra Kumar Dangwal
Thanks Euler,
Greatly appreciate your inputs.

> Should pgoutput provide a complete row? Probably. If it is an option that 
> defaults to false and doesn't impact performance.

Yes, it would be great if this feature can be implemented.

> The logical replication design decides to compute the generated columns at 
> subscriber side.

If I understand correctly, this approach involves establishing a function on 
the subscriber's side that emulates the operation executed to derive the 
generated column values.
If yes, I see one potential issue where disparities might surface between the 
values of generated columns on the subscriber's side and those computed within 
Postgres. This could happen if the generated column's value relies on the 
current_time function.

Please let me know how can we track the feature requests and the discussions 
around that.

Thanks,
Rajendra.

Re: Pgoutput not capturing the generated columns

2023-08-01 Thread Euler Taveira
On Tue, Aug 1, 2023, at 3:47 AM, Rajendra Kumar Dangwal wrote:
> With decoderbufs and wal2json the connector is able to capture the generated 
> column `full_name` in above example. But with pgoutput the generated column 
> was not captured. 

wal2json materializes the generated columns before delivering the output. I
decided to materialized the generated columns in the output plugin because the
target consumers expects a complete row.

> Is this a known limitation of pgoutput plugin? If yes, where can we request 
> to add support for this feature?

I wouldn't say limitation but a design decision.

The logical replication design decides to compute the generated columns at
subscriber side. It was a wise decision aiming optimization (it doesn't
overload the publisher that is *already* in charge of logical decoding).

Should pgoutput provide a complete row? Probably. If it is an option that
defaults to false and doesn't impact performance.

The request for features should be done in this mailing list.


--
Euler Taveira
EDB   https://www.enterprisedb.com/