Re: Reduce useless changes before reassembly during logical replication

2024-03-06 Thread li jie
Hi,

Sorry I replied too late.

> This is all true but note that in successful cases (where the table is
> published) all the work done by FilterByTable(accessing caches,
> transaction-related stuff) can add noticeable overhead as anyway we do
> that later in pgoutput_change().

You are correct. Frequent opening of transactions and access to cache will
cause a lot of overhead, which Andy has tested and proved.

The root cause is because every dml wal record needs to do this, which is really
wasteful. I use a hash table LocatorFilterCache to solve this problem.
After getting
a RelFileLocator,  I go to the hash table to check its
PublicationActions and filter it
based on the PublicationActions to determine whether it has been published.

The effect of my test is very obvious: (perf record)
v1:
  Children  Self  Command   Shared O  Symbol
+ 22.04% 1.53%  postgres  postgres   [.] FilterByTable

v2:
  Children  Self  Command   Shared O  Symbol
+  0.58% 0.00%  postgres  postgres  [.] ReorderBufferFilterByLocator

v1 patch introduces 20% overhead, while v2 only has 0.58%.


> Note, users can
>configure to stream_in_progress transactions in which case they
> shouldn't see such a big problem.

Yes, stream mode can prevent these irrelevant changes from being written to
disk or sent to downstream.
However, CPU and memory consumption will also be incurred when processing
these useless changes. Here is my simple test[1]:

base on master :

CPU stat: perf stat -p pid -e cycles -I 1000
# time counts unit events
76.007070936 9,691,035 cycles
77.007163484 5,977,694 cycles
78.007252533 5,924,703 cycles
79.007346862 5,861,934 cycles
80.007438070 5,858,264 cycles
81.007527122 6,408,759 cycles
82.007615711 6,397,988 cycles
83.007705685 5,520,407 cycles
84.007794387 5,359,162 cycles
85.007884879 5,194,079 cycles
86.007979797 5,391,270 cycles
87.008069606 5,474,536 cycles
88.008162827 5,594,190 cycles
89.008256327 5,610,023 cycles
90.008349583 5,627,350 cycles
91.008437785 6,273,510 cycles
92.008527938 580,934,205 cycles
93.008620136 4,404,672 cycles
94.008711818 4,599,074 cycles
95.008805591 4,374,958 cycles
96.008894543 4,300,180 cycles
97.008987582 4,157,892 cycles
98.009077445 4,072,178 cycles
99.009163475 4,043,875 cycles
100.009254888 5,382,667 cycles

memory stat:  pistat -p pid -r 1 10
07:57:18 AM UID PID minflt/s majflt/s VSZ RSS %MEM Command
07:57:19 AM 1000 11848 233.00 0.00 386872 81276 0.01 postgres
07:57:20 AM 1000 11848 235.00 0.00 387008 82068 0.01 postgres
07:57:21 AM 1000 11848 236.00 0.00 387144 83124 0.01 postgres
07:57:22 AM 1000 11848 236.00 0.00 387144 83916 0.01 postgres
07:57:23 AM 1000 11848 236.00 0.00 387280 84972 0.01 postgres
07:57:24 AM 1000 11848 334.00 0.00 337000 36928 0.00 postgres
07:57:25 AM 1000 11848 3.00 0.00 337000 36928 0.00 postgres
07:57:26 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
07:57:27 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
07:57:28 AM 1000 11848 0.00 0.00 337000 36928 0.00 postgres
Average: 1000 11848 151.30 0.00 362045 6 0.01 postgres

After patched:
# time counts unit events
76.007623310 4,237,505 cycles
77.007717436 3,989,618 cycles
78.007813848 3,965,857 cycles
79.007906412 3,601,715 cycles
80.007998111 3,670,835 cycles
81.008092670 3,495,844 cycles
82.008187456 3,822,695 cycles
83.008281335 5,034,146 cycles
84.008374998 3,867,683 cycles
85.008470245 3,996,927 cycles
86.008563783 3,823,893 cycles
87.008658628 3,825,472 cycles
88.008755246 3,823,079 cycles
89.008849719 3,966,083 cycles
90.008945774 4,012,704 cycles
91.009044492 4,026,860 cycles
92.009139621 3,860,912 cycles
93.009242485 3,961,533 cycles
94.009346304 3,799,897 cycles
95.009440164 3,959,602 cycles
96.009534251 3,960,405 cycles
97.009625904 3,762,581 cycles
98.009716518 4,859,490 cycles
99.009807720 3,940,845 cycles
100.009901399 3,888,095 cycles

08:01:47 AM UID PID minflt/s majflt/s VSZ RSS %MEM Command
08:01:48 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:49 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:50 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:51 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:52 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:53 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:54 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:55 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:56 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
08:01:57 AM 1000 19466 0.00 0.00 324424 15140 0.00 postgres
Average: 1000 19466 0.00 0.00 324424 15140 0.00 postgres

Through comparison, it is found that patch is also profitable for stream mode.
Of course, LocatorFilterCache also need to deal with invalidation, such as the
corresponding relation invalidate, or pg_publication changes, just like
RelationSyncCache and RelfilenumberMapHash.
But ddl is a small amount after all, which is insignificant compared to a
large amount of dml.

Another problem is that the LocatorFilterCache 

Reduce useless changes before reassembly during logical replication

2024-01-16 Thread li jie
Hi hackers,

During logical replication, if there is a large write transaction, some
spill files will be written to disk, depending on the setting of
logical_decoding_work_mem.

This behavior can effectively avoid OOM, but if the transaction
generates a lot of change before commit, a large number of files may
fill the disk. For example, you can update a TB-level table.

However, I found an inelegant phenomenon. If the modified large table is not
published, its changes will also be written with a large number of spill files.
Look at an example below:

publisher:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE PUBLICATION mypub FOR TABLE public.tbl_pub;
```

subscriber:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432
user=postgres dbname=postgres' PUBLICATION mypub;
```

publisher:
```
begin;
insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,99) i;
```

Later you will see a large number of spill files in the
"/$PGDATA/pg_replslot/mysub/" directory.
```
$ll -sh
total 4.5G
4.0K -rw--- 1 postgres postgres 200 Nov 30 09:24 state
17M -rw--- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-1000.spill
12M -rw--- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-100.spill
17M -rw--- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-1100.spill
..
```

We can see that table tbl_t1 is not published in mypub. It also won't be sent
downstream because it's not subscribed.
After the transaction is reorganized, the pgoutput decoding plugin filters out
changes to these unpublished relationships when sending logical changes.
See function pgoutput_change.

Most importantly, if we filter out unpublished relationship-related
changes after constructing the changes but before queuing the changes
into a transaction, will it reduce the workload of logical decoding
and avoid disk
or memory growth as much as possible?

The patch in the attachment is a prototype, which can effectively reduce the
memory and disk space usage during logical replication.

Design:
1. Added a callback LogicalDecodeFilterByRelCB for the output plugin.

2. Added this callback function pgoutput_table_filter for the pgoutput plugin.
Its main implementation is based on the table filter in the
pgoutput_change function.
Its main function is to determine whether the change needs to be published based
on the parameters of the publication, and if not, filter it.

3. After constructing a change and before Queue a change into a transaction,
use RelidByRelfilenumber to obtain the relation associated with the change,
just like obtaining the relation in the ReorderBufferProcessTXN function.

4. Relation may be a toast, and there is no good way to get its real
table relation based on toast relation. Here, I get the real table oid
through toast relname, and then get the real table relation.

5. This filtering takes into account INSERT/UPDATE/INSERT. Other
changes have not been considered yet and can be expanded in the future.

Test:
1. Added a test case 034_table_filter.pl
2. Like the case above, create two tables, the published table tbl_pub and
the non-published table tbl_t1
3. Insert 10,000 rows of toast data into tbl_t1 on the publisher, and use
pg_ls_replslotdir to record the total size of the slot directory
every second.
4. Compare the size of the slot directory at the beginning of the
transaction(size1),
the size at the end of the transaction (size2), and the average
size of the entire process(size3).
5. Assert(size1==size2==size3)

Sincerely look forward to your feedback.
Regards, lijie


v1-Reduce-useless-changes-before-reassembly-during-logi.patch
Description: Binary data


Reduce useless changes before reassembly during logical replication

2023-12-07 Thread li jie
Hi hackers,

During logical replication, if there is a large write transaction, some
spill files will be written to disk, depending on the setting of
logical_decoding_work_mem.

This behavior can effectively avoid OOM, but if the transaction
generates a lot of change before commit, a large number of files may
fill the disk. For example, you can update a TB-level table.
Of course, this is also inevitable.

But I found an inelegant phenomenon. If the updated large table is not
published, its changes will also be written with a large number of spill files.
Look at an example below:

publisher:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE PUBLICATION mypub FOR TABLE public.tbl_pub;
```

subscriber:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432
user=postgres dbname=postgres' PUBLICATION mypub;
```

publisher:
```
begin;
insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,99) i;
```

Later you will see a large number of spill files in the
"/$PGDATA/pg_replslot/mysub/" directory.
```
$ll -sh
total 4.5G
4.0K -rw--- 1 postgres postgres 200 Nov 30 09:24 state
17M -rw--- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-1000.spill
12M -rw--- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-100.spill
17M -rw--- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-1100.spill
..
```

We can see that table tbl_t1 is not published in mypub. It also won't be sent
downstream because it's not subscribed.
After the transaction is reorganized, the pgoutput decoding plugin filters out
changes to these unpublished relationships when sending logical changes.
See function pgoutput_change.

Most importantly, if we filter out unpublished relationship-related
changes after
constructing the changes but before queuing the changes into a transaction,
will it reduce the workload of logical decoding and avoid disk or memory growth
as much as possible?

Attached is the patch I used to implement this optimization.

Design:

1. Added a callback LogicalDecodeFilterByRelCB for the output plugin.

2. Added this callback function pgoutput_table_filter for the pgoutput plugin.
Its main implementation is based on the table filter in the
pgoutput_change function.
Its main function is to determine whether the change needs to be published based
on the parameters of the publication, and if not, filter it.

3. After constructing a change and before Queue a change into a transaction,
use RelidByRelfilenumber to obtain the relation associated with the change,
just like obtaining the relation in the ReorderBufferProcessTXN function.

4. Relation may be a toast, and there is no good way to get its real
table relation based on toast relation. Here, I get the real table oid
through toast relname, and then get the real table relation.

5. This filtering takes into account INSERT/UPDATE/INSERT. Other
changes have not been considered yet and can be expanded in the future.

Test:
1. Added a test case 034_table_filter.pl
2. Like the case above, create two tables, the published table tbl_pub and
the non-published table tbl_t1
3. Insert 10,000 rows of toast data into tbl_t1  on the publisher, and use
   pg_ls_replslotdir to record the total size of the slot directory
every second.
4. Compare the size of the slot directory at the beginning of the
transaction(size1),
   the size at the end of the transaction (size2), and the average
size of the entire process(size3).
5. Assert(size1==size2==size3)

Sincerely look forward to your feedback.
Regards, lijie


v1-Reduce-useless-changes-before-reassemble-during-logi.patch
Description: Binary data


Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2023-12-03 Thread li jie
> This may be helpful for the case you have mentioned but how about
> cases where there is nothing to filter by relation?

You can see the complete antecedent in the email [1]. Relation that has
not been published will also generate changes and put them into the entire
transaction group, which will increase invalid memory or disk space.

> It will add
> overhead related to the transaction start/end and others for each
> change. Currently, we do that just once for all the changes that need
> to be processed.

Yes, it will only be processed once at present. It is done when applying
each change when the transaction is committed. This patch hopes to
advance it to the time when constructing the change, and determines the
change queue into a based on whether the relation is published.

> I wonder why the spilling can't be avoided with GUC
> logical_decoding_work_mem?

Of course you can, but this will only convert disk space into memory space.
 For details, please see the case in Email [1].

[1] 
https://www.postgresql.org/message-id/CAGfChW51P944nM5h0HTV9HistvVfwBxNaMt_s-OZ9t%3DuXz%2BZbg%40mail.gmail.com

Regards, lijie

Amit Kapila  于2023年12月2日周六 12:11写道:
>
> On Fri, Dec 1, 2023 at 1:55 PM li jie  wrote:
> >
> > > This is just an immature idea. I haven't started to implement it yet.
> > > Maybe it was designed this way because there
> > > are key factors that I didn't consider. So I want to hear everyone's
> > > opinions, especially the designers of logic decoding.
> >
> > Attached is the patch I used to implement this optimization.
> > The main designs are as follows:
> > 1. Added a callback LogicalDecodeFilterByRelCB for the output plugin.
> >
> > 2. Added this callback function pgoutput_table_filter for the pgoutput 
> > plugin.
> > Its main implementation is based on the table filter in the
> > pgoutput_change function.
> >
> > 3. After constructing a change and before Queue a change into a transaction,
> > use RelidByRelfilenumber to obtain the relation associated with the 
> > change,
> > just like obtaining the relation in the ReorderBufferProcessTXN 
> > function.
> >
> > 4. Relation may be a toast, and there is no good way to get its real
> > table relation
> >based on toast relation. Here, I get the real table oid through
> > toast relname, and
> >then get the real table relation.
> >
>
> This may be helpful for the case you have mentioned but how about
> cases where there is nothing to filter by relation? It will add
> overhead related to the transaction start/end and others for each
> change. Currently, we do that just once for all the changes that need
> to be processed. I wonder why the spilling can't be avoided with GUC
> logical_decoding_work_mem?
>
> --
> With Regards,
> Amit Kapila.




Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2023-12-03 Thread li jie
>
> This may be helpful for the case you have mentioned but how about
> cases where there is nothing to filter by relation?

You can see the complete antecedent in the email [1]. Relation that has
not been published will also generate changes and put them into the entire
transaction group, which will increase invalid memory or disk space.

> It will add
> overhead related to the transaction start/end and others for each
> change. Currently, we do that just once for all the changes that need
> to be processed.

Yes, it will only be processed once at present. It is done when applying
each change when the transaction is committed. This patch hopes to
advance it to the time when constructing the change, and determines the
change queue into a based on whether the relation is published.

> I wonder why the spilling can't be avoided with GUC
> logical_decoding_work_mem?

Of course you can, but this will only convert disk space into memory space.
 For details, please see the case in Email [1].

Regards, lijie




Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2023-12-01 Thread li jie
> This is just an immature idea. I haven't started to implement it yet.
> Maybe it was designed this way because there
> are key factors that I didn't consider. So I want to hear everyone's
> opinions, especially the designers of logic decoding.

Attached is the patch I used to implement this optimization.
The main designs are as follows:
1. Added a callback LogicalDecodeFilterByRelCB for the output plugin.

2. Added this callback function pgoutput_table_filter for the pgoutput plugin.
Its main implementation is based on the table filter in the
pgoutput_change function.

3. After constructing a change and before Queue a change into a transaction,
use RelidByRelfilenumber to obtain the relation associated with the change,
just like obtaining the relation in the ReorderBufferProcessTXN function.

4. Relation may be a toast, and there is no good way to get its real
table relation
   based on toast relation. Here, I get the real table oid through
toast relname, and
   then get the real table relation.

5. This filtering takes into account INSERT/UPDATE/INSERT. Other
changes have not
   been considered yet and can be expanded in the future.

6.  The table filter in pgoutput_change and the get relation in
ReorderBufferProcessTXN
  can be deleted. This has not been done yet. This is the next step.

Sincerely look forward to your feedback.
Regards, lijie


v1-Filter-irrelevant-change-before-reassemble-transacti.patch
Description: Binary data


Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2023-11-30 Thread li jie
Hi,

During logical decoding, if there is a large write transaction, some
spill files will be written to disk,
depending on the setting of max_changes_in_memory.

This behavior can effectively avoid OOM, but if the transaction
generates a lot of change before commit,
a large number of files may fill the disk. For example, you can update
a TB-level table.
Of course,   this is also inevitable.

But I found an inelegant phenomenon. If the updated large table is not
published, its changes will also
be written with a large number of spill files. Look at an example below:

publisher:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE PUBLICATION mypub FOR TABLE public.tbl_pub;
```

subscriber:
```
create table tbl_pub(id int, val1 text, val2 text,val3 text);
create table tbl_t1(id int, val1 text, val2 text,val3 text);
CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432
user=postgres dbname=postgres' PUBLICATION mypub;
```

publisher:
```
begin;
insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba',
i),repeat('dfds', i) from generate_series(0,99) i;
```

Later you will see a large number of spill files in the
"/$PGDATA/pg_replslot/mysub/" directory.
```
$ll -sh
total 4.5G
4.0K -rw--- 1 postgres postgres 200 Nov 30 09:24 state
17M -rw--- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-1000.spill
12M -rw--- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-100.spill
17M -rw--- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-1100.spill
..
```

We can see that table tbl_t1 is not published in mypub. It is also not
sent downstream because it is subscribed.
After the transaction is reorganized, the pgoutput decoding plug-in
filters out these change  of unpublished relation
when sending logical changes. see function pgoutput_change.

Above all, if after constructing a change and before queuing a change
into a transaction, we filter out unpublished
relation-related changes, will it make logical decoding less laborious
and avoid disk growth as much as possible?


This is just an immature idea. I haven't started to implement it yet.
Maybe it was designed this way because there
are key factors that I didn't consider. So I want to hear everyone's
opinions, especially the designers of logic decoding.




Re: Support logical replication of DDLs

2022-12-19 Thread li jie
I have presented some comments below:

1. AT_AddColumn

> + tmpobj = new_objtree_VA("ADD %{objtype}s %{definition}s", 3,
[ IF NOT EXISTS ] is missing here.

2.  AT_DropColumn
> + tmpobj = new_objtree_VA("DROP %{objtype}s %{column}I", 3,
[ IF EXISTS ] is missing here.

3. AT_DropConstraint
> + tmpobj = new_objtree_VA("DROP CONSTRAINT %{constraint}I", 2,
[ IF EXISTS ] is missing here.

4. AT_DetachPartition
> + tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D", 2,
[ CONCURRENTLY | FINALIZE ] is missing here.

5. deparse_CreateSeqStmt
> + ret = new_objtree_VA("CREATE %{persistence}s SEQUENCE %{identity}D 
> %{definition: }s", 3,
[ IF NOT EXISTS ] is missing here.

6. deparse_IndexStmt
> + ret = new_objtree_VA("CREATE %{unique}s INDEX %{concurrently}s 
> %{if_not_exists}s %{name}I ON %{table}D USING %{index_am}s (%{definition}s)", 
> 7,
[ INCLUDE ] and [ ONLY ] are  missing here.

7. deparse_RuleStmt
> + foreach(cell, actions)
> + list = lappend(list, new_string_object(lfirst(cell)));

if (actions == NIL)
 list = lappend(list, new_string_object("NOTHING"));
else
{
  foreach(cell, actions)
  list = lappend(list, new_string_object(lfirst(cell)));
}

8. AT_AddIndexConstraint
> + tmpobj = new_objtree_VA("ADD CONSTRAINT %{name}I %{constraint_type}s USING 
> INDEX %{index_name}I %{deferrable}s %{init_deferred}s", 6,
> + "type", ObjTypeString, "add constraint using index",
> + "name", ObjTypeString, get_constraint_name(constrOid),
> + "constraint_type", ObjTypeString,
> + istmt->deferrable ? "DEFERRABLE" : "NOT DEFERRABLE",

"constraint_type", ObjTypeString,
istmt->primary ? "PRIMARY KEY" : "UNIQUE",

9. regress test

Zheng Li  于2022年12月12日周一 12:58写道:


>
> Hi,
>
> Attached please find the DDL deparser testing module in the v45-0007
> patch, this testing module is written by Runqi Tian in [1] with minor
> modification from myself. I think we can
> start adding more tests to the module now that we're getting close to
> finish implementing the DDL deparser.
>
> This testing module ddl_deparse_regress aims to achieve the following
> four testing goals for the DDL deparser:
> 1. Test that the generated JSON blob is expected using SQL tests.
> 2. Test that the re-formed DDL command is expected using SQL tests.
> 3. Test that the re-formed DDL command has the same effect as the
> original command
>by comparing the results of pg_dump, using the SQL tests in 1 and 2.
> 4. Test that any new DDL syntax is handled by the DDL deparser by
> capturing and deparsing
>DDL commands ran by pg_regress.
>
> 1 and 2 is tested with SQL tests, by comparing the deparsed JSON blob
> and the re-formed command.
> 3 is tested with TAP framework in t/001_compare_dumped_results.pl
> 4 is tested with TAP framework and pg_regress in 002_regress_tests.pl,
> the execution is currently commented out because it will fail due
> unimplemented commands in the DDL deparser.
>
> [1] 
> https://www.postgresql.org/message-id/flat/CAH8n8_jMTunxxtP4L-3tc%3DGNamg%3Dmg1X%3DtgHr9CqqjjzFLwQng%40mail.gmail.com
>

The test patch is very useful.
 I see that the sql case in test_ddl_deparse_regress is similar to the
one in test_ddl_deparse.
Why don't we merge the test cases in test_ddl_deparse_regress into
test_ddl_deparse,
as the sql case can be completely reused with the sql files in test_ddl_deparse?
I believe this will make the tests more comprehensive and reduce redundancy.


Regards,
li jie




Re: Support logical replication of DDLs

2022-12-12 Thread li jie
I noticed that the issue of ownership seems to have not been considered.
For example, if a user 'a' from the publishing side creates a table t1,
the owner of t1 is not user 'a' after it is replicated to the subscribing side.
This is a situation that has not been encountered in previous DML replication.

I think the ownership relationship should not be lost,
and we should perhaps add it back,
like pg_dump "ALTER TABLE public.t1 OWNER TO a;",
even though we do not currently support the replication of USER.


Thought?  li jie.




Re: Support logical replication of DDLs

2022-12-08 Thread li jie
>
> Attached please find a new solution that skips the deparsing of ALTER TABLE
> subcommands generated for TableLikeClause. The patch v42-0005 added a new
> boolean field table_like to AlterTableStmt in order to identify an ALTER TABLE
> subcommand generated internally for the TableLikeClause.
>
> Regards,
> Zheng

I took a look at this patch and it appears to be incomplete.

> @@ -1974,6 +1974,7 @@ typedef struct AlterTableStmt
> List *cmds; /* list of subcommands */
> ObjectType objtype; /* type of object */
> bool missing_ok; /* skip error if table missing */
> + bool table_like; /* internally generated for TableLikeClause */
> } AlterTableStmt;

The table_like field should include implementations of the `copynode`
and `equalnode `methods.




Re: Support logical replication of DDLs

2022-12-08 Thread li jie
It is worth considering whether temporary objects, such as tables,
indexes, and sequences,
should be replicated to  the subscriber side.

Like temp tables, different sessions create their own temp tables.
If they are all transferred to the subscriber side, there will
inevitably be errors,
 because there is only one subscription session.

I think temporary objects should not be part of replication because
they are visible within the session.
 replicate them over would not make them visible to the user and would
not be meaningful.

Here is a case:
```
create temp table t1(id int);
\c
create temp table t1(id int);
```


Thoughts?
li jie

vignesh C  于2022年12月8日周四 13:07写道:
>
> On Wed, 7 Dec 2022 at 17:50, Alvaro Herrera  wrote:
> >
> > I think this patch is split badly.
> >
> > You have:
> >
> > 0001 an enormous patch including some required infrastructure, plus the
> > DDL deparsing bits themselves.
> >
> > 0002 another enormous (though not as much) patch, this time for
> > DDL replication using the above.
> >
> > 0003 a bugfix for 0001, which includes changes in both the
> > infrastructure and the deparsing bits.
> >
> > 0004 test stuff for 0002.
> >
> > 0005 Another bugfix for 0001
> >
> > 0006 Another bugfix for 0001
> >
> > As presented, I think it has very little chance of being reviewed
> > usefully.  A better way to go about this, I think, would be:
> >
> > 0001 - infrastructure bits to support the DDL deparsing parts (all these
> > new functions in ruleutils.c, sequence.c, etc).  That means, everything
> > (?) that's currently in your 0001 except ddl_deparse.c and friends.
> > Clearly there are several independent changes here; maybe it is possible
> > to break it down even further.  This patch or these patches should also
> > include the parts of 0003, 0005, 0006 that require changes outside of
> > ddl_deparse.c.
> > I expect that this patch should be fairly small.
> >
> > 0002 - ddl_deparse.c and its very close friends.  This should not have
> > any impact on places such as ruleutils.c, sequence.c, etc.  The parts of
> > the bugfixes (0001, 0005, 0006) that touch this could should be merged
> > here as well; there's no reason to have them as separate patches.  Some
> > test code should be here also, though it probably doesn't need to aim to
> > be complete.
> > This one is likely to be very large, but also self-contained.
> >
> > 0003 - ddlmessage.c and friends.  I understand that DDL-messaging is
> > supporting infrastructure for DDL replication; I think it should be its
> > own patch.  Probably include its own simple-ish test bits.
> > Not a very large patch.
> >
> > 0004 - DDL replication proper, including 0004.
> > Probably not a very large patch either, not sure.
> >
> >
> > Some review comments, just skimming:
> > - 0002 adds some functions to event_trigger.c, but that doesn't seem to
> > be their place.  Maybe some new file in src/backend/replication/logical
> > would make more sense.
> >
> > - publication_deparse_ddl_command_end has a long strcmp() list; why?
> > Maybe change things so that it compares some object type enum instead.
> >
> > - CreatePublication has a long list of command tags; is that good?
> > Maybe it'd be better to annotate the list in cmdtaglist.h somehow.
> >
> > - The change in pg_dump's getPublications needs updated to 16.
> >
> > - Don't "git add" src/bin/pg_waldump/logicalddlmsgdesc.c, just update
> > its Makefile and meson.build
> >
> > - I think psql's \dRp should not have the new column at the end.
> > Maybe one of:
> > + Name | Owner | DDL | All tables | Inserts | Updates | Deletes | Truncates 
> > | Via root
> > + Name | Owner | All tables | DDL | Inserts | Updates | Deletes | Truncates 
> > | Via root
> > + Name | Owner | All tables | Inserts | Updates | Deletes | Truncates | DDL 
> > | Via root
> > (I would not add the "s" at the end of that column title, also).
>
> Thanks for the comments, these comments will make the patch reviewing easier.
> There are a couple of review comments [1] and [2] which are spread
> across the code, it will be difficult to handle this after
> restructuring of the patch as the comments are spread across the code
> in the patch. So we will handle [1] and [2] first and then work on
> restructuring work suggested by you.
>
> [1] - 
> https://www.postgresql.org/message-id/CAHut%2BPsERMFwO8oK3LFH_3CRG%2B512T%2Bay_viWzrgNetbH2MwxA%40mail.gmail.com
> [2] - 
> https://www.postgresql.org/message-id/CAHut%2BPuxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w%40mail.gmail.com
>
> Regards,
> Vignesh
>
>




Re: Support logical replication of DDLs

2022-12-01 Thread li jie
I applied patch 0005.

I think this modification is a bit overdone.
This design skips all subcommands, which results in many ddl
replication failures.
For example:
```
CREATE TABLE datatype_table (id SERIAL);
```
deparsed ddl is:
CREATE  TABLE  public.datatype_table (id pg_catalog.int4 STORAGE plain
NOT NULL DEFAULT
pg_catalog.nextval('public.datatype_table_id_seq'::pg_catalog.regclass))
CREATE SEQUENCE subcommand will be skipped.

OR:
```
CREATE SCHEMA element_test
CREATE TABLE foo (id int)
CREATE VIEW bar AS SELECT * FROM foo;
```
deparsed ddl is:
CREATE SCHEMA element_test.

Its subcommands will be skipped.
There may be other cases.

For the initial CREATE LIKE statement, It is special,
It derives the subcommand of alter table column.
Just skipping them may be enough.
Instead of skipping subcommands of all statements.
After all, our design is to obtain the actual ddl information from the
catalog instead of parsing raw parsetree.
This is why we cannot skip all subcommands.

Do you have any better ideas?

Regards, Adger.




Re: Support logical replication of DDLs

2022-11-29 Thread li jie
I will continue to give feedback for this patch.

1.  LIKE STORAGE
```
CREATE TABLE ctlt (a text, c text);
ALTER TABLE ctlt ALTER COLUMN c SET STORAGE EXTERNAL;
CREATE TABLE ctlt_storage (LIKE ctlt INCLUDING STORAGE);
```

postgres=# \d+ ctlt_storage

 Table "public.ctlt_storage"

 Column | Type | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description

+--+---+--+-+--+-+--+-

 a  | text |   |  | | extended |
  |  |

 c  | text |   |  | | extended |
  |  |


It can be seen that the storage attribute in column C of table
ctlt_storage is not replicated.

After the CREATE TABLE LIKE statement is converted,
the LIKE STORAGE attribute is lost because it is difficult to display
it in the CREATE TABLE syntax.
Maybe we need to add a statement to it, like 'ALTER TABLE ctlt_storage
ALTER COLUMN c SET STORAGE EXTERNAL;'.

2. Reference subcommand be dropped.
```
create table another (f1 int, f2 text, f3 text);

alter table another
  alter f1 type text using f2 || ' and ' || f3 || ' more',
  alter f2 type bigint using f1 * 10,
  drop column f3;
```

The following error occurs downstream:
ERROR:  column "?dropped?column?" does not exist at character 206
STATEMENT:  ALTER TABLE public.another DROP COLUMN f3 , ALTER COLUMN
f1 SET DATA TYPE pg_catalog.text COLLATE pg_catalog."default" USING
(((f2 OPERATOR(pg_catalog.||) ' and '::pg_catalog.text)
OPERATOR(pg_catalog.||) "?dropped?column?") OPERATOR(pg_catalog.||) '
more'::pg_catalog.text), ALTER COLUMN f2 SET DATA TYPE pg_catalog.int8
USING (f1 OPERATOR(pg_catalog.*) 10)

Obviously, column f3 has been deleted and its name no longer exists.
Maybe we need to keep it and save it in advance like a drop object.
 However,  ATLER TABLE is complex, and this problem also occurs in
other similar scenarios.


Thoughts?   Adger.




Re: Support logical replication of DDLs

2022-11-27 Thread li jie
>
>
> All three commands are captured by the event trigger. The first and
> second command ends up getting deparsed, WAL-logged and
> replayed on the subscriber. The replay of the ALTER TABLE command
> causes a duplicate constraint error. The problem is that
> while subcommands are captured by event triggers by default, they
> don't need to be deparsed and WAL-logged for DDL replication.
> To do that we can pass the isCompleteQuery variable in
> ProcessUtilitySlow to EventTriggerCollectSimpleCommand() and
> EventTriggerAlterTableEnd() and make this information available in
> CollectedCommand so that any subcommands can be skipped.
>

May not be able to skip any subcommands.
like ' ALTER TABLE  ctlt1_like ALTER COLUMN b SET STORAGE EXTERNAL;'

It cannot be represented in the CREATE TABLE  statement.


Regards,  Adger


Re: Support logical replication of DDLs

2022-11-25 Thread li jie
Hi Developer,

I have been following this patch for a long time.
Recently, I started to try to test it. I found several bugs
here and want to give you feedback.

1. CREATE TABLE LIKE
  I found that this case may be repication incorrectly.
   You can run the following SQL statement:
   ```
   CREATE TABLE ctlt1 (a text CHECK (length(a) > 2) PRIMARY KEY, b text);
ALTER TABLE ctlt1 ALTER COLUMN a SET STORAGE MAIN;
ALTER TABLE ctlt1 ALTER COLUMN b SET STORAGE EXTERNAL;
CREATE TABLE ctlt1_like (LIKE ctlt1 INCLUDING ALL);
   ```
   The ctlt1_like table will not be able to correct the replication.
  I think this is because create table like statement is captured by
  the event trigger to a create table statement and multiple alter table
statements.
  There are some overlaps between them, and an error is reported when
downstream replication occurs.

2. ALTER TABLE (inherits)
case:
```
CREATE TABLE gtest30 (
a int,
b int GENERATED ALWAYS AS (a * 2) STORED
);
CREATE TABLE gtest30_1 () INHERITS (gtest30);
ALTER TABLE gtest30 ALTER COLUMN b DROP EXPRESSION;
```
After this case is executed in the publication, the following error occurs
in the subscription :

ERROR:  column "b" of relation "gtest30" is not a stored generated column
STATEMENT:  ALTER TABLE public.gtest30 ALTER COLUMN b DROP EXPRESSION, ALTER
COLUMN b DROP EXPRESSION

Obviously, the column modifications of the inherited table were also
captured,

and then deparse the wrong statement.

I believe that such errors may also occur in other alter table subcmd
scenarios where tables are inherited.



3. ALTER TABLE SET STATISTICS


case:

```
CREATE TABLE test_stat (a int);
ALTER TABLE test_stat ALTER a SET STATISTICS -1;

```

After this case is executed in the publication, the following error occurs
in the subscription :


syntax error at or near "4294967295" at character 60
STATEMENT: ALTER TABLE public.test_stat ALTER COLUMN a SET STATISTICS
4294967295


I guess this should be an overflow in the integer conversion process.



4.  json null string coredump


case:

```
CREATE OR REPLACE FUNCTION test_ddl_deparse_full()
RETURNS event_trigger LANGUAGE plpgsql AS
$$
DECLARE
r record;
deparsed_json text;
BEGIN
FOR r IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
deparsed_json = ddl_deparse_to_json(r.command);
RAISE NOTICE 'deparsed json: %', deparsed_json;
RAISE NOTICE 're-formed command: %',
ddl_deparse_expand_command(deparsed_json);
END LOOP;
END;
$$;

CREATE EVENT TRIGGER test_ddl_deparse_full
ON ddl_command_end EXECUTE PROCEDURE test_ddl_deparse_full();

CREATE SCHEMA AUTHORIZATION postgres;


```


If the preceding case is executed, coredump occurs,

which is related to null string and can be reproduced.



I hope these feedbacks can be helpful to you.

We sincerely wish you complete the ddl Logical replication feature.


 Regards,  Adger



vignesh C  于2022年11月25日周五 14:18写道:

> On Sun, 20 Nov 2022 at 09:29, vignesh C  wrote:
> >
> > On Fri, 11 Nov 2022 at 11:03, Peter Smith  wrote:
> > >
> > > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith 
> wrote:
> > > >
> > > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith 
> wrote:
> > > > >
> > > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith 
> wrote:
> > > > > >
> > > > > > Here are more review comments for the v32-0001 file ddl_deparse.c
> > > > > >
> > > > > > *** NOTE - my review post became too big, so I split it into
> smaller parts.
> > > > >
> > > >
> > >
> > > THIS IS PART 4 OF 4.
> > >
> > > ===
> > >
> > > src/backend/commands/ddl_deparse.c
> >
> > Thanks for the comments, the attached v39 patch has the changes for the
> same.
>
> One comment:
> While fixing review comments, I found that default syntax is not
> handled for create domain:
> +   /*
> +* Verbose syntax
> +*
> +* CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s
> %{constraints}s
> +* %{collation}s
> +*/
> +   createDomain = new_objtree("CREATE");
> +
> +   append_object_object(createDomain,
> +"DOMAIN %{identity}D AS",
> +
> new_objtree_for_qualname_id(TypeRelationId,
> +
>   objectId));
> +   append_object_object(createDomain,
> +"%{type}T",
> +
> new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));
>
> Regards,
> Vignesh
>
>
>


Re: Is it worth pushing conditions to sublink/subplan?

2021-08-22 Thread li jie
Indeed, this may be useful for partition pruning.
I am also curious about why this has not been achieved.

Wenjing  于2021年8月23日周一 上午10:46写道:

> Hi Hackers,
>
> Recently, a issue has been bothering me, This is about conditional
> push-down in SQL.
> I use cases from regression testing as an example.
> I found that the conditions  (B =1)  can be pushed down into the
> subquery, However, it cannot be pushed down to sublink/subplan.
> If a sublink/subplan clause contains a partition table, it can be useful
> to get the conditions for pruning.
> So, is it worth pushing conditions to sublink/subplan?
> Anybody have any ideas?
>
>
> regards,
> Wenjing
>
>
> example:
> create table p (a int, b int, c int) partition by list (a);
> create table p1 partition of p for values in (1);
> create table p2 partition of p for values in (2);
> create table q (a int, b int, c int) partition by list (a);
> create table q1 partition of q for values in (1) partition by list (b);
> create table q11 partition of q1 for values in (1) partition by list (c);
> create table q111 partition of q11 for values in (1);
> create table q2 partition of q for values in (2) partition by list (b);
> create table q21 partition of q2 for values in (1);
> create table q22 partition of q2 for values in (2);
> insert into q22 values (2, 2, 3);
>
>
> postgres-# explain (costs off)
> postgres-# select temp.b  from
> postgres-# (
> postgres(# select a,b from ab x where x.a = 1
> postgres(# union all
> postgres(# (values(1,1))
> postgres(# ) temp,
> postgres-# ab y
> postgres-# where  y.b = temp.b and y.a = 1 and y.b=1;
> QUERY PLAN
> ---
>  Nested Loop
>->  Seq Scan on ab_a1_b1 y
>  Filter: ((b = 1) AND (a = 1))
>->  Append
>  ->  Subquery Scan on "*SELECT* 1"
>->  Seq Scan on ab_a1_b1 x
>  Filter: ((a = 1) AND (b = 1))
>  ->  Result
> (8 rows)
>
> The conditions  (B =1)  can be pushed down into the subquery.
>
> postgres=# explain (costs off)
> postgres-# select
> postgres-# y.a,
> postgres-# (Select x.b from ab x where y.a =x.a and y.b=x.b) as b
> postgres-# from ab y where a = 1 and b = 1;
> QUERY PLAN
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1))
>SubPlan 1
>  ->  Append
>->  Seq Scan on ab_a1_b1 x_1
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b2 x_2
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b3 x_3
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b1 x_4
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b2 x_5
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b3 x_6
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b1 x_7
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b2 x_8
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b3 x_9
>  Filter: ((y.a = a) AND (y.b = b))
> (22 rows)
>
> The conditions (B = 1 and A = 1) cannot be pushed down to sublink/subplan
> in targetlist.
>
> postgres=# explain (costs off)
> postgres-# select y.a
> postgres-# from ab y
> postgres-# where
> postgres-# (select x.a > x.b from ab x where y.a =x.a and y.b=x.b) and
> postgres-# y.a = 1 and y.b = 1;
> QUERY PLAN
> ---
>  Seq Scan on ab_a1_b1 y
>Filter: ((a = 1) AND (b = 1) AND (SubPlan 1))
>SubPlan 1
>  ->  Append
>->  Seq Scan on ab_a1_b1 x_1
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b2 x_2
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a1_b3 x_3
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b1 x_4
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b2 x_5
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a2_b3 x_6
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b1 x_7
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b2 x_8
>  Filter: ((y.a = a) AND (y.b = b))
>->  Seq Scan on ab_a3_b3 x_9
>  Filter: ((y.a = a) AND (y.b = b))
> (22 rows)
>
> The conditions  (B=1 and A=1)  cannot be pushed down to sublink/subplan in
> where clause.
>
>
>
>