Tables get stuck at srsubstate = f

2024-06-06 Thread Avi Weinberg
Hi all,

I'm using logical replication with Postgres 15.2.

When creating logical replication on multiple tables, sometimes the initial 
sync get stuck for few tables at state srsubstate = f.  If I recreate the 
logical replication again, it may get stuck at srsubstate = f  for other 
tables, so it does not seems to be table specific.  If course, in the majority 
of cases, building the logical replication succeed and all tables are showing 
srsubstate = r.

What might be the reason that logical replication get stuck during initial sync 
at srsubstate =f?
We use this view pg_stat_progress_copy to monitor initial sync.  Is it possible 
that calling this frequently during initial sync causes some lock not to be 
released?
Is it a Postgres bug?

Thanks!

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Scenarios that Replication Slot will be Marked as Active = false

2024-05-10 Thread Avi Weinberg
Hi Experts,

I would like to write a monitoring script that will check logical replication 
running on our setup.  For that I would like to know when to alert when I see 
that replication slot is marked as active= false.  I will probably need to know 
when it is a temporary state that I need to wait before I alert and when to 
alert immediately when I see the slot is inactive.

If the replication slot falls behind the primary server will it be marked as 
Active = false until the replica catches up?  If so, can the lag size be 
configured that only if it crosses the threshold it  will be marked as inactive.

Will failure in the replica affect the active status of its replication slot?

What are other scenarios that can cause it to be inactive.

Thanks!


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


replication slot "pg_1015733_sync_1014718_7358407382484881476" does not exist

2024-04-17 Thread Avi Weinberg
Fixed a typo...

Hi Experts,

For a second time in the past few months I'm getting the following errors in 
Postgres log.  Last time it was solved when I reset all Postgres pods.  Now 
reset no longer helps.  Logical replication is not working even after I 
performed the reset.
Any ideas what is wrong?

From: Avi Weinberg 
Sent: Wednesday, April 17, 2024 6:01 PM
To: pgsql-generallists.postgresql.org 
Subject: [External] replication slot 
"pg_1015733_sync_1014718_7358407382484881476" does not exist

Hi Experts,

For a second time in the past few months I'm getting the following errors in 
Postgres log.  Last time it was solved when I reset all Postgres pods.  Now 
reset no longer helps.  Logical replication is now working even after I 
performed the reset.
Any ideas what is wrong?


ERROR:  replication slot "pg_1015733_sync_1014718_7358407382484881476" does not 
exist
STATEMENT:  DROP_REPLICATION_SLOT pg_1015733_sync_1014718_7358407382484881476 
WAIT
ERROR:  replication slot "pg_1015733_sync_1014956_7358407382484881476" does not 
exist
STATEMENT:  DROP_REPLICATION_SLOT pg_1015733_sync_1014956_7358407382484881476 
WAIT
192 ERROR:  replication slot "pg_927671_sync_927281_7358103111795036228" does 
not exist
START_REPLICATION SLOT "pg_927671_sync_927281_7358103111795036228" LOGICAL 
0/AC888F00 (proto_version '3', publication_names '"aaa","bbb"')
replication slot "pg_1015733_sync_1015375_7358407382484881476" does not exist
DROP_REPLICATION_SLOT pg_1015733_sync_1015375_7358407382484881476 WAIT


Thanks
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


replication slot "pg_1015733_sync_1014718_7358407382484881476" does not exist

2024-04-17 Thread Avi Weinberg
Hi Experts,

For a second time in the past few months I'm getting the following errors in 
Postgres log.  Last time it was solved when I reset all Postgres pods.  Now 
reset no longer helps.  Logical replication is now working even after I 
performed the reset.
Any ideas what is wrong?


ERROR:  replication slot "pg_1015733_sync_1014718_7358407382484881476" does not 
exist
STATEMENT:  DROP_REPLICATION_SLOT pg_1015733_sync_1014718_7358407382484881476 
WAIT
ERROR:  replication slot "pg_1015733_sync_1014956_7358407382484881476" does not 
exist
STATEMENT:  DROP_REPLICATION_SLOT pg_1015733_sync_1014956_7358407382484881476 
WAIT
192 ERROR:  replication slot "pg_927671_sync_927281_7358103111795036228" does 
not exist
START_REPLICATION SLOT "pg_927671_sync_927281_7358103111795036228" LOGICAL 
0/AC888F00 (proto_version '3', publication_names '"aaa","bbb"')
replication slot "pg_1015733_sync_1015375_7358407382484881476" does not exist
DROP_REPLICATION_SLOT pg_1015733_sync_1015375_7358407382484881476 WAIT


Thanks
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: [External] Simple way to simulate a bug in logical replication

2024-03-12 Thread Avi Weinberg
Please note:
I tried it also with Postgres 15.6 and the behavior is the same
I'm running inside docker.  I do not know if it matters...

Hi All,

I think I hit a bug in logical replication in version 15.2.  I sync two tables 
of size 305MB but pg_stat_progress_copy shows that 5GB as bytes_processed and 
the sync takes forever.
Is this a bug?  If so, what can I do with this scenario?

Thanks

Create a dummy table

CREATE TABLE example_table (
id SERIAL PRIMARY KEY,
binary_data BYTEA
);

Create a function that inserts a row of few MB into the table
CREATE OR REPLACE FUNCTION generate_large_data()
RETURNS BYTEA AS $$
DECLARE
large_data BYTEA;
BEGIN
large_data := decode(repeat('DEADBEEF', 10240), 'hex');
RETURN large_data;
END;
$$ LANGUAGE plpgsql;

I Inserted this sample row 112 times
INSERT INTO example_table (binary_data) VALUES (generate_large_data());

Checked that table size
SELECT pg_size_pretty(pg_total_relation_size('example_table'));
The table size was  305MB

I added this table to logical replication (and another table called 
example_table1 that had similar structure and data.
I run the following command
select * from pg_stat_progress_copy

I got that bytes_processed for each of the tables is over 5GB

"pid","datid","datname","relid","command","type","bytes_processed","bytes_total","tuples_processed","tuples_excluded","pg_size_pretty"
599,16384,"aaa_db",19833,"COPY FROM","CALLBACK","5898403943","0","13","0","5625 
MB"
597,16384,"aa_db",19824,"COPY FROM","CALLBACK","5898403938","0","13","0","5625 
MB"


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Simple way to simulate a bug in logical replication

2024-03-12 Thread Avi Weinberg
Hi All,

I think I hit a bug in logical replication in version 15.2.  I sync two tables 
of size 305MB but pg_stat_progress_copy shows that 5GB as bytes_processed and 
the sync takes forever.
Is this a bug?  If so, what can I do with this scenario?

Thanks

Create a dummy table

CREATE TABLE example_table (
id SERIAL PRIMARY KEY,
binary_data BYTEA
);

Create a function that inserts a row of few MB into the table
CREATE OR REPLACE FUNCTION generate_large_data()
RETURNS BYTEA AS $$
DECLARE
large_data BYTEA;
BEGIN
large_data := decode(repeat('DEADBEEF', 10240), 'hex');
RETURN large_data;
END;
$$ LANGUAGE plpgsql;

I Inserted this sample row 112 times
INSERT INTO example_table (binary_data) VALUES (generate_large_data());

Checked that table size
SELECT pg_size_pretty(pg_total_relation_size('example_table'));
The table size was  305MB

I added this table to logical replication (and another table called 
example_table1 that had similar structure and data.
I run the following command
select * from pg_stat_progress_copy

I got that bytes_processed for each of the tables is over 5GB

"pid","datid","datname","relid","command","type","bytes_processed","bytes_total","tuples_processed","tuples_excluded","pg_size_pretty"
599,16384,"aaa_db",19833,"COPY FROM","CALLBACK","5898403943","0","13","0","5625 
MB"
597,16384,"aa_db",19824,"COPY FROM","CALLBACK","5898403938","0","13","0","5625 
MB"


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: [External] Re: walsender RAM increases by 500 MB while data is 80 MB

2024-03-12 Thread Avi Weinberg
Hi,


Thanks Masahiko Sawada for your reply.

We did not have many transactions running when walsenders started consuming 
more than expected RAM.  We only had 1-2 transactions that inserted a row with 
about 40-70 MB of binary data to a table.
Changing logical_decoding_work_mem to as low as 1MB did not help.

Any other suggestions that might help?

Thanks!


-Original Message-
From: Masahiko Sawada 
Sent: Monday, March 11, 2024 10:38 AM
To: Avi Weinberg 
Cc: pgsql-generallists.postgresql.org 
Subject: [External] Re: walsender RAM increases by 500 MB while data is 80 MB

On Mon, Mar 11, 2024 at 12:33 AM Avi Weinberg  wrote:
>
> Hi Experts,
>
>
>
> Your input is most welcome!
>
>
>
> We are using Postgres 13 (and plan to upgrade to 15 soon).  We have logical 
> replication with about 40 servers subscribing to one publisher.  40 Walsender 
> processes are running on the publisher server.  When we insert a row into a 
> table holding binary data the walsender RAM usage increases by 500MB although 
> the row binary data is only 80MB.  We see this increase in all walsender 
> processes.  At some point we got OOM and the process was killed.

This sounds like similar reports[1][2] we got before. Were there any 
long-running transactions at that time when the 80MB data change was made? And 
is it accessible to the core dump of the walsender process who was killed due 
to OOM?

> Why does the walsender increases by 500MB when the data change was
> only 80MB Is some of the 500MB increase due to shared memory or each
> walsender has its own 500MB increase.  I assume that if it was only in shared 
> memory we would not have gotten OOM… Why when logical_decoding_work_mem = 
> 64MB the RAM is 15 times that size?  Shouldn't any additional space be used 
> from disk and not RAM?
> Will adding streaming = on to publication "PUBLICATION pub WITH (streaming = 
> on)" can alleviate the issue?
> Are there configuration options that can resolve the RAM issue.  It can be 
> also in version 15 since we plan to upgrade soon.

If you're facing a similar issue I shared above, temporarily setting 
logical_decoding_work_mem a *lower* value could alleviate the situation.  
Setting a lower value would lead to more evictions in logical decoding, and it 
probably can avoid using much memory and OOM (note that the logical decoding 
gets slower instead). I think there is a memory accounting issue in logical 
decoding, which could end up using memory much more than the 
logical_decoding_work_mem limit in some scenarios. This issue is not fixed yet, 
and setting "streaming = on" doesn't help.

Regards,

[1] 
https://www.postgresql.org/message-id/CAMnUB3oYugXCBLSkih%2BqNsWQPciEwos6g_AMbnz_peNoxfHwyw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/17974-f8c9d353a62f414d%40postgresql.org

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com/
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


walsender RAM increases by 500 MB while data is 80 MB

2024-03-10 Thread Avi Weinberg
Hi Experts,

Your input is most welcome!

We are using Postgres 13 (and plan to upgrade to 15 soon).  We have logical 
replication with about 40 servers subscribing to one publisher.  40 Walsender 
processes are running on the publisher server.  When we insert a row into a 
table holding binary data the walsender RAM usage increases by 500MB although 
the row binary data is only 80MB.  We see this increase in all walsender 
processes.  At some point we got OOM and the process was killed.


  1.  Why does the walsender increases by 500MB when the data change was only 
80MB
  2.  Is some of the 500MB increase due to shared memory or each walsender has 
its own 500MB increase.  I assume that if it was only in shared memory we would 
not have gotten OOM...
  3.  Why when logical_decoding_work_mem = 64MB the RAM is 15 times that size?  
Shouldn't any additional space be used from disk and not RAM?
  4.  Will adding streaming = on to publication "PUBLICATION pub WITH 
(streaming = on)" can alleviate the issue?
  5.  Are there configuration options that can resolve the RAM issue.  It can 
be also in version 15 since we plan to upgrade soon.

Thanks!

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Trigger to Count Number of Logical Replication Table Changes.

2023-11-16 Thread Avi Weinberg
I'm using Postgres (13 and 15) logical replication to sync data from two 
servers. I would like to have an update counter whenever data is changed. The 
counter can be incremented by 1 even if multiple rows are updated, but it is 
also ok to be incremented the counter by the number of rows updated (but it 
seems less efficient to me).
I need the counter to increase after initial sync as well as after regular 
logical replication sync.
Triggers not to work without ENABLE ALWAYS.
In addition, If I try trigger that is "FOR EACH STATEMENT" it works only for 
initial sync and not for regular logical replication sync.
Having per row set_time_trig  trigger takes about 1 minute when updating 50k 
rows in one transaction (all I need is to increase update_count by 1, why spend 
1 minute for it) . How can I improve this?
CREATE TABLE IF NOT EXISTS tst.t2
(
id bigint NOT NULL,
c1 int,
CONSTRAINT pk_t2 PRIMARY KEY (id)
);

CREATE TABLE IF NOT EXISTS tst.time_audit_tbl
(
table_name character varying(63) COLLATE pg_catalog."default" NOT NULL,
update_count integer DEFAULT 0,
CONSTRAINT updated_time_audit_unique UNIQUE (table_name)
);


CREATE FUNCTION tst.set_time() RETURNS trigger
LANGUAGE plpgsql SECURITY DEFINER
AS $$
DECLARE
  updated_count int;
BEGIN
UPDATE tst.time_audit_tbl SET update_count = update_count + 1 WHERE 
table_name = CONCAT(TG_TABLE_SCHEMA, '.', TG_TABLE_NAME);
GET DIAGNOSTICS updated_count = ROW_COUNT;
IF updated_count = 0 THEN
  RAISE EXCEPTION 'set_updated_time().  Table not found %.%', 
TG_TABLE_SCHEMA, TG_TABLE_NAME;
END IF;

RETURN coalesce(NEW, OLD);
END;
$$;


CREATE  TRIGGER set_time_trig
AFTER INSERT OR DELETE OR UPDATE
ON tst.t2
FOR EACH ROW
EXECUTE FUNCTION tst.set_time();

ALTER TABLE tst.t2 ENABLE ALWAYS TRIGGER set_time_trig;

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Postgres Out Of Memory Crash

2023-11-02 Thread Avi Weinberg
Hi experts,

I'm using Patroni Postgres installation and noticed that twice already postgres 
crashed due to out of memory.  I'm using logical replication with around 30-40 
active subscribers on this machine.  The machine has 128GB but only 32GB is 
allocated to Postgres.  How can I know what is actually causing the out of 
memory issue?  Is it caused by not optimal postgres configuration or something 
else?

/usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data 
--config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 
--cluster_name=postgres-cluster --wal_level=logical --hot_standby=on 
--max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 
--max_locks_per_transaction=64 --track_commit_timestamp=on 
--max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on


Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, 
order=0, oom_score_adj=993
Oct 27 07:05:31 node2 kernel: postgres 
cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope
 mems_allowed=0
Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: 
loaded Tainted: G    T 3.10.0-1160.el7.x86_64 #1
Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS 
Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020
Oct 27 07:05:31 node2 kernel: Call Trace:
Oct 27 07:05:31 node2 kernel: [] dump_stack+0x19/0x1b
Oct 27 07:05:31 node2 kernel: [] dump_header+0x90/0x229
Oct 27 07:05:31 node2 kernel: [] ? 
ep_poll_callback+0xf8/0x220
Oct 27 07:05:31 node2 kernel: [] ? 
find_lock_task_mm+0x56/0xc0
Oct 27 07:05:31 node2 kernel: [] ? 
try_get_mem_cgroup_from_mm+0x28/0x60
Oct 27 07:05:31 node2 kernel: [] 
oom_kill_process+0x2cd/0x490
Oct 27 07:05:31 node2 kernel: [] 
mem_cgroup_oom_synchronize+0x55c/0x590
Oct 27 07:05:31 node2 kernel: [] ? 
mem_cgroup_charge_common+0xc0/0xc0
Oct 27 07:05:31 node2 kernel: [] 
pagefault_out_of_memory+0x14/0x90
Oct 27 07:05:31 node2 kernel: [] mm_fault_error+0x6a/0x157
Oct 27 07:05:31 node2 kernel: [] 
__do_page_fault+0x491/0x500
Oct 27 07:05:31 node2 kernel: [] do_page_fault+0x35/0x90
Oct 27 07:05:31 node2 kernel: [] page_fault+0x28/0x30
Oct 27 07:05:31 node2 kernel: Task in 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$
Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, 
failcnt 144867
Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 
9007199254740988kB, failcnt 0
Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, 
failcnt 0
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice:
 cache:$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: Memory cgroup stats for 
/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$
Oct 27 07:05:31 node2 kernel: [ pid ]   uid  tgid total_vm  rss nr_ptes 
swapents oom_score_adj name
Oct 27 07:05:31 node2 kernel: [13159] 0 13159  2391   3 
   0  -998 pause
Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095   97   8 
   0   993 dumb-init
Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156  171   8 
   0   993 sh
Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137   98   8 
   0   993 runsvdir
Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099   98   7 
   0   993 runsv
Oct 27 07:05:31 node2 kernel: [13440]   101 1344027026 1186  54 
   0   993 pgqd
Oct 27 07:05:31 node2 kernel: [13441]   101 13441   155215 8237 101 
   0   993 patroni
Oct 27 07:05:31 node2 kernel: [19532]   101 19532  174003346817 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19542]   101 19542  1767874 6713 121 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19546]   101 19546  1740173  14450313166 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19547]   101 19547  174006920060 171 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19548]   101 19548  1740027 4821  86 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549  1740283 1011  91 
   0   993 postgres
Oct 27 07:05:31 node2 kernel: [19549]   101 19549  

Logical Replication - Adding Not-Null Column Without Default Value

2023-10-17 Thread Avi Weinberg
Hi Experts,

I'm using logical replication with Postgres 15.

I added not-null column without default value to one of my published tables.  
The way I so it on publisher side is to add the column as nullable, then update 
the value based on some logic, and finally make the column not-null.

On the subscriber side, if add the column as nullable then add dummy value and 
finally make it not-null (all in a single truncation), will it be safe to say 
that the correct values of this column will be logically synchronized from 
publisher to subscriber and will override the dummy values.  Would, at the end, 
the two tables on publisher and subscriber have exactly the same values for 
this column.

Do you see a scenario that the tables will not be the same?  Is there a better 
way to add a not-null column without default value on subscriber side?

Thanks

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Problematic enforcement of "ERROR: functions in index predicate must be marked IMMUTABLE"

2023-07-09 Thread Avi Weinberg
Hi,

If you attempt to create an index based on function that is not IMMUTABLE you 
will get an exception "ERROR:  functions in index predicate must be marked 
IMMUTABLE".  However, if you created the index when the function was IMMUTABLE, 
but later on you updated the function and mistakenly removed the IMMUTABLE key, 
you will not get any error to alert you that there is an index based on this 
function and it should remain IMMUTABLE.

I suggest triggering error message also when updating a function that is used 
by index if it is no longer IMMUTABLE

Avi



IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Postgres Incompatibility

2023-05-17 Thread Avi Weinberg

Hi Experts

I noticed this problem when creating a backup on Postgres 13 and restoring it 
on Postgres 15.

CREATE FUNCTION sync.show_pg_subscription1() RETURNS SETOF 
pg_subscription
LANGUAGE sql
AS $$
SELECT * from pg_subscription;
$$;

The Postgres backup creates the view syntax with " FROM 
sync.show_pg_subscription1()  show_pg_subscription1(oid, subdbid, subname, 
subowner, subenabled, subconninfo, subslotname, subsynccommit, 
subpublications);" The original syntax just had FROM 
sync.show_pg_subscription1() .

CREATE OR REPLACE VIEW sync.pg_subscription_view1
AS
SELECT show_pg_subscription1.oid,
show_pg_subscription1.subdbid,
show_pg_subscription1.subname,
show_pg_subscription1.subowner,
show_pg_subscription1.subenabled,
show_pg_subscription1.subconninfo,
show_pg_subscription1.subslotname,
show_pg_subscription1.subsynccommit,
show_pg_subscription1.subpublications
   FROM sync.show_pg_subscription1() show_pg_subscription1(oid, subdbid, 
subname, subowner, subenabled, subconninfo, subslotname, subsynccommit, 
subpublications);
We get an error:
ERROR: column reference "subconninfo" is ambiguous LINE 8: 
show_pg_subscription1.subconninfo,

If we remove the part generated by Postgres backup " show_pg_subscription1(oid, 
subdbid, subname, subowner, subenabled, subconninfo, subslotname, 
subsynccommit, subpublications);" it works.

What can be done so backups will be able to restore correctly? I have a not so 
elegant solution, but I hope to hear a better solution than creating additional 
view:

create view abc as
SELECT
oid,
subdbid,
subname,
subowner,
subenabled,
subconninfo,
subslotname,
subsynccommit,
subpublications
from pg_subscription p;

CREATE FUNCTION sync.show_pg_subscription1() RETURNS SETOF abc
LANGUAGE sql
AS $$
SELECT * from abc;
$$;

SELECT show_pg_subscription1.oid,
show_pg_subscription1.subdbid,
show_pg_subscription1.subname,
show_pg_subscription1.subowner,
show_pg_subscription1.subenabled,
show_pg_subscription1.subconninfo,
show_pg_subscription1.subslotname,
show_pg_subscription1.subsynccommit,
show_pg_subscription1.subpublications
   FROM sync.show_pg_subscription1() show_pg_subscription1(oid, subdbid, 
subname, subowner, subenabled, subconninfo, subslotname, subsynccommit, 
subpublications);

This works also with the additional part generated by Postgres backup

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - Give One Subscription Priority Over Other Subscriptions

2022-09-18 Thread Avi Weinberg
Hi all,

I use logical replication to synchronize one big table + some small tables.

I know that if all tables are part of the same subscription it will be executed 
in transactional order. In other words, as long as the transaction that updated 
the big table is not synchronized and executed on the subscriber side, no other 
transactions to the smaller tables will be synchronized to the subscriber side. 
 Therefore, I split it into two subscriptions, one for the big table and 
another for the smaller tables.

  *   How can I give higher priority to the subscription that synchronizing the 
smaller tables over the subscription that synchronizes the large table?
  *   If that is not possible, how do I make sure the subscription that 
synchronizes the big table does not "starve" the subscription that synchronizes 
the small tables and leave it with almost no bandwidth?



Thanks!

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Is data passing between publisher and subscriber in logical replication compressed?

2022-05-01 Thread Avi Weinberg
Hi Expert,

I'm using logical replication to synchronize data between publisher and various 
subscribers.  I would like to know if the data flowing between publisher and 
subscriber is compressed?  In addition, is there anything I an do to make the 
bandwidth usage more efficient?

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: Multiple SELECT statements Using One WITH statement

2022-01-20 Thread Avi Weinberg

Thanks David for the reply, but my question was a little different.
I know I can have multiple CTE queries like you showed, but I want to have one 
single WITH query, and use it in multiple queries, not just by one query the 
directly proceed the CTE.
Why do I need to execute the CTE query twice if I have two queries that wants 
to use it?

Thanks

From: David G. Johnston [mailto:david.g.johns...@gmail.com]
Sent: Thursday, January 20, 2022 2:58 PM
To: Avi Weinberg 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Multiple SELECT statements Using One WITH statement



On Thursday, January 20, 2022, Avi Weinberg 
mailto:a...@gilat.com>> wrote:
Hi,

Can I have multiple select statements using one WITH statement?

WITH t AS (
Select A, B from …
)
SELECT A into tableA FROM t where ….;

SELECT B into tableB FROM t where ….;

With q1 as (), q2 as (), q3 as () main_query

David J.

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Multiple SELECT statements Using One WITH statement

2022-01-20 Thread Avi Weinberg
Hi,

Can I have multiple select statements using one WITH statement?

WITH t AS (
Select A, B from ...
)
SELECT A into tableA FROM t where ;

SELECT B into tableB FROM t where ;

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Using FOREIGN TABLE to get the Size of the Actual Remote Table Behind it

2022-01-13 Thread Avi Weinberg
Hi Experts,

Is it possible to get the size of the table a foreign table is pointing to 
(without creating another connecting to the remote server)?  Obviously this 
"select pg_size_pretty(pg_total_relation_size('f_table'))" returns 0.

Thanks!

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Refresh ONE publication out of the Two Publications the Subscription has

2022-01-04 Thread Avi Weinberg
Is it possible to specify the publication name to be refreshed in case I do not 
want to refresh all publications of the subscription?
Something like:
alter subscription subscription_name refresh publication publication_name

Thanks
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


postgres_fdw FAST "where id in (140,144,148)" SLOW: "where id in (select 140 as id union select 144 union select 148)"

2022-01-03 Thread Avi Weinberg
Hi

I have postgres_fdw table called tbl_link.  The source table is 2.5 GB in size 
with 122 lines (some lines has 70MB bytea column, but not the ones I select in 
the example)
I noticed that when I put the specific ids in the list "where id in 
(140,144,148)" it works fast (few ms), but when I put the same list as select 
"where id in (select 140 as id union select 144  union select 148)" it takes 50 
seconds.  This select union is just for the example, I obviously have a 
different select (which by itself takes few ms but cause the whole insert query 
to take 1x more time)

Why is that?  How can I still use regular select and still get reasonable 
response time?

Thanks


FAST:
select lnk.*
into local_1
from tbl_link lnk
where id in (140,144,148)

"Foreign Scan on tbl_link lnk  (cost=100.00..111.61 rows=3 width=700) (actual 
time=4.161..4.167 rows=3 loops=1)"
"Planning Time: 0.213 ms"
"Execution Time: 16.251 ms"



SLOW:
select lnk.*
into local_1
from tbl_link lnk
where id in (select 140 as id union select 144  union select 148)


"Hash Join  (cost=100.18..113.88 rows=3 width=700) (actual 
time=45398.721..46812.100 rows=3 loops=1)"
"  Hash Cond: (lnk.id = (140))"
"  ->  Foreign Scan on tbl_link lnk  (cost=100.00..113.39 rows=113 width=700) 
(actual time=45398.680..46812.026 rows=112 loops=1)"
"  ->  Hash  (cost=0.14..0.14 rows=3 width=4) (actual time=0.023..0.026 rows=3 
loops=1)"
"Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"->  HashAggregate  (cost=0.08..0.11 rows=3 width=4) (actual 
time=0.017..0.021 rows=3 loops=1)"
"  Group Key: (140)"
"  Batches: 1  Memory Usage: 24kB"
"  ->  Append  (cost=0.00..0.07 rows=3 width=4) (actual 
time=0.005..0.009 rows=3 loops=1)"
"->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.003..0.004 rows=1 loops=1)"
"->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.001..0.001 rows=1 loops=1)"
"->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.000..0.001 rows=1 loops=1)"
"Planning Time: 0.541 ms"
"Execution Time: 46827.945 ms"



FAST


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Initial Sync - One Subscriber After The Other vs. Parallel

2021-12-29 Thread Avi Weinberg
Hi All

I'm testing logical replication and noticed that when I sync large tables to 
multiple subscribers, it is synchronizing two large tables out of 4 for 
subscriber A and then then two tables to subscriber B and continue to toggle 
between the two subscriptions.

It is possible to tell it to complete one subscription before starting the 
other?
What are the parameters that I can configure to increase the performance of 
sync and reduce its overall sync time, when one publisher is publishing to 
multiple subscriptions?

2021-12-29 02:46:06 UTC [1125]: [1-1] 61cbcbee.465 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit3" has started
2021-12-29 02:46:06 UTC [1127]: [1-1] 61cbcbee.467 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit4" has started
2021-12-29 02:46:17 UTC [1173]: [1-1] 61cbcbf9.495 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit3" has started
2021-12-29 02:46:17 UTC [1175]: [1-1] 61cbcbf9.497 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit4" has started
2021-12-29 02:48:52 UTC [1894]: [1-1] 61cbcc94.766 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit2" has started
2021-12-29 02:51:49 UTC [3175]: [1-1] 61cbcd45.c67 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit2" has started

2021-12-29 02:48:44 UTC [1127]: [2-1] 61cbcbee.467 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit4" has finished
2021-12-29 02:48:47 UTC [1125]: [2-1] 61cbcbee.465 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit3" has finished
2021-12-29 02:51:46 UTC [1175]: [2-1] 61cbcbf9.497 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit4" has finished
2021-12-29 02:51:49 UTC [1173]: [2-1] 61cbcbf9.495 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit3" has finished
2021-12-29 02:51:51 UTC [1894]: [2-1] 61cbcc94.766 0 LOG:  logical 
replication table synchronization worker for subscription "sub_dcn", table 
"packit2" has finished
2021-12-29 02:54:30 UTC [3175]: [2-1] 61cbcd45.c67 0 LOG:  logical 
replication table synchronization worker for subscription "sub_gwn", table 
"packit2" has finished


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - One table 8 minutes, Identical Two Tables 28 Minutes (Expected 16 Minutes)

2021-12-27 Thread Avi Weinberg
Hi All,

When I have only one table (2.5GB) in logical replication publication it takes 
8 minutes to sync to the subscriber when I created a copy of the table and 
added it to the publication it took 28 minutes.  When I sync three IDENTICAL 
tables of the same size it take 46 minutes!
I expected the tables to be synched in parallel or at least one after the 
other, but it turned out the time increase is much larger than just to sync 
additional table of the same size.  Two tables should be 8*2=16 minutes and not 
28 and three tables should be 8*3=24 minutes and not 46.

The table has one pk int column and one binary column file_content of type 
bytea.

What might be the reason for this behavior? Can I improve the time it takes to 
sync several tables so it will be no more than the (time to sync one table) * 
(number of tables)

Doing it for two tables took 28 minutes
2021-12-27 10:32:13.740 UTC [4430] LOG:  logical replication apply worker for 
subscription "sub_gwn1" has started
2021-12-27 10:32:13.815 UTC [4431] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p1" has started
2021-12-27 10:32:13.827 UTC [4432] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p" has started
2021-12-27 10:57:57.542 UTC [4432] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p" has finished
2021-12-27 11:00:03.806 UTC [4431] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p1" has finished

Doing it for three tables took 46 minutes
2021-12-27 11:49:11.931 UTC [4739] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p2" has started
2021-12-27 11:49:11.948 UTC [4740] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p1" has started
2021-12-27 12:23:44.673 UTC [4740] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p1" has finished
2021-12-27 12:23:44.783 UTC [4845] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p" has started
2021-12-27 12:28:53.320 UTC [4739] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p2" has finished
2021-12-27 12:35:49.156 UTC [4845] LOG:  logical replication table 
synchronization worker for subscription "sub_gwn1", table "p" has finished

Thanks!

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: Identity/Serial Column In Subscriber's Tables

2021-12-09 Thread Avi Weinberg
Hi Laurenz

Thanks for your answer.

If I do not modify data on the subscriber side, the best practice will be to 
not to define the column as identity or it is better to have subscriber's table 
definition as close as possible to the publisher and leave it as identity?

Thanks

-Original Message-
From: Laurenz Albe [mailto:laurenz.a...@cybertec.at]
Sent: Thursday, December 9, 2021 1:51 PM
To: Avi Weinberg ; pgsql-general@lists.postgresql.org
Subject: Re: Identity/Serial Column In Subscriber's Tables

On Thu, 2021-12-09 at 09:13 +, Avi Weinberg wrote:
> What is the best practice when dealing with Identity/Serial column in logical 
> replication.
> Should the subscriber table have this column also defined as Identity/Serial 
> or have it defined as regular integer?
> Is leaving it as Identity/Serial also on subscriber tables, can potentially 
> cause a problem?

It does not really matter if you use "serial" or an identity column on the 
logical standby or not.  The sequence on the standby won't be advanced.

The best way to prevent conflicts from happening is *not* to modify the data on 
the standby.

Yours,
Laurenz Albe
--
Cybertec | 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.cybertec-postgresql.com%2Fdata=04%7C01%7CAviW%40gilat.com%7C9d8efd6e75b5451a878708d9bb0a2da7%7C7300b1a3573a401092a61c65cd85e927%7C0%7C0%7C637746474671657970%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=SK9ZhzUJireCa4ClEWRGyHX2x6F6f%2FkM4V8Iw4l8vZY%3Dreserved=0

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.




Identity/Serial Column In Subscriber's Tables

2021-12-09 Thread Avi Weinberg
Hi Experts,

What is the best practice when dealing with Identity/Serial column in logical 
replication.
Should the subscriber table have this column also defined as Identity/Serial or 
have it defined as regular integer?
Is leaving it as Identity/Serial also on subscriber tables, can potentially 
cause a problem?

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: Are Foreign Key Disabled During Logical Replication Initial Sync?

2021-12-06 Thread Avi Weinberg
Hi Peter,

Thanks for answer.

Just to clarify, they are disabled during initial sync only or are always 
disabled on subscriber side?
Are all triggers disabled during initial sync or just foreign keys?
How can I know that initial sync completed for all tables?  Is it checking when 
pg_subscription_rel.srsubstate is 'i' or 'd' for all tables or there is a 
better way?

Thanks!





-Original Message-
From: Peter Eisentraut [mailto:peter.eisentr...@enterprisedb.com]
Sent: Tuesday, December 7, 2021 7:19 AM
To: Avi Weinberg ; pgsql-general@lists.postgresql.org
Subject: Re: Are Foreign Key Disabled During Logical Replication Initial Sync?

On 06.12.21 15:50, Avi Weinberg wrote:
> Does it mean that populating each table is done in a single
> transaction?  If so, when I have tables with foreign keys between
> them, is it guaranteed that logical replication will populates the
> tables in the proper order so the foreign key will be enforced?  Or
> maybe the foreign keys are disabled during initial sync and only after
> all tables are populated the FKs are enabled again.

Foreign keys are disabled on logical replication subscribers.

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Are Foreign Key Disabled During Logical Replication Initial Sync?

2021-12-06 Thread Avi Weinberg
Hi all,

I understood that during logical replication initial sync tables are copied in 
full using some type of "copy command".
Does it mean that populating each table is done in a single transaction?  If 
so, when I have tables with foreign keys between them, is it guaranteed that 
logical replication will populates the tables in the proper order so the 
foreign key will be enforced?  Or maybe the foreign keys are disabled during 
initial sync and only after all tables are populated the FKs are enabled again.

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - When to Enable Disabled Subscription and When to Create a New One

2021-12-02 Thread Avi Weinberg
Sorry, my previous mail had in the title Publication instead of Subscription
Hi All,

If I find a subscription in subenabled = false state.  Is it safe to run ALTER 
subscription and put it in ENABLE state or it is better to drop and create it 
from scratch?

I'm trying to figure our what can cause subscription to be in subenabled = 
false state.  If we know all the scenarios that can cause it to be disabled 
maybe we can determine when to enable the subscription and when to recreate it.

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - When to Enable Disabled Publication and When to Create a New One

2021-12-02 Thread Avi Weinberg
Hi All,

If I find a subscription in subenabled = false state.  Is it safe to run ALTER 
subscription and put it in ENABLE state or it is better to drop and create it 
from scratch?

I'm trying to figure our what can cause subscription to be in subenabled = 
false state.  If we know all the scenarios that can cause it to be disabled 
maybe we can determine when to enable the subscription and when to recreate it.

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - Source and Destination Table Name Not The Same

2021-11-14 Thread Avi Weinberg
Hi,

Is it possible to define logical replication where the source and destination 
table names are not the same.  I need to sync a table from some source, but 
already have a table with the same name.

Thanks
Avi
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Avi Weinberg
Thanks David for the reply.

I also thought that default value on the subscriber side are immaterial.  
However, with the case I showed without having default value on subscriber side 
it get null when the following occurs:

  1.  Table was created with two columns on publisher and subscriber side
  2.  Data inserted into this table
  3.  A third column is added to table with default value on publisher side, 
but without default value on subscriber side
  4.  The default value column has value for existing rows on publisher, but 
null on the subscriber side.
  5.  Doing refresh publication etc. does not help and the column on subscriber 
side remains with nulls


Your input is most welcome


From: David G. Johnston [mailto:david.g.johns...@gmail.com]
Sent: Tuesday, November 9, 2021 5:55 PM
To: Avi Weinberg 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Should Destination Table Columns Be Defined 
With Default Value

On Tue, Nov 9, 2021 at 7:50 AM Avi Weinberg 
mailto:a...@gilat.com>> wrote:
Was this done on purpose, that default value for new column is not copied for 
existing data?  Does this mean that on destination side we must also define the 
table with default value?

Logical replication has two modes, initial synchronization and ongoing change 
push.  The ongoing change push sends entire rows, when they change, from the 
publisher to subscriber.

The initial sync happens once, when the subscriber initially subscribes to the 
publication.

As entire rows are getting sent, defaults on the subscriber are immaterial so 
far as the published rows are concerned.

If you run a command on the publisher that causes every row to change then of 
course every row will be published with those new values to the subscriber.

David J.

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - Should Destination Table Columns Be Defined With Default Value

2021-11-09 Thread Avi Weinberg
Hi Experts,

I'm using logical replication and have the following open issue:
If the table on the publisher side has column with default value, should the 
table on the destination (subscriber) be defined with default value as well?

My intuition was not to define it with default value since it gets its values 
from publisher, but then I face the following issue that default value given to 
existing data when a column is added is not copied to the destination side.

on source/publisher side 

create table table1(id int primary key, a int);
insert into table1 select 1,1;
alter table table1 add column b int not null default 1;
select * from table1
output id,a, b:
1  1  1



 on destination/subscriber side ***
create table table1(id int primary key, a int);
select * from table1
alter table table1 add column b int;  -- I purposely defined it without default 
value
ALTER SUBSCRIPTION aaa REFRESH PUBLICATION
select * from table1
output id,a, b:
1  1  null
Why the (default) value given to column b for existing row is not synced to the 
subscriber.  Of course, for new rows the default value is copied to subscriber 
table.

Was this done on purpose, that default value for new column is not copied for 
existing data?  Does this mean that on destination side we must also define the 
table with default value?


If instead of the default on the publisher side I do the following it works and 
the value is copied to the subscriber.

do $$
begin
alter table table1 add column b int;
update table1 set b = 1;
ALTER TABLE table1 ALTER COLUMN b SET NOT NULL;
end $$;


IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Subscriber to Get Only Some of The Tables From Publisher

2021-09-08 Thread Avi Weinberg
Hi,

I have a publisher with around 30 tables.  I have two types of subscribers.  
Both types needs 25 "common" tables from the publisher plus 2-3 specific tables 
for each type of subscriber.

For maintenance and monitoring reasons it is better for me to have both 
subscribers point to the same publisher but "skip/ignore" 2-3 tables that are 
not needed for them.  If this is not possible, I will be forced to create 2 
publishers.

Any idea hot to ignore tables from publisher?
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


get current worker processes count

2021-08-30 Thread Avi Weinberg
Hi

I set max_worker_processes to 20.  How can I check how many are currently 
running?
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


max_worker_processes - Check How Many Worker Processes are Currently Active

2021-08-08 Thread Avi Weinberg
Hi,

Is it possible to check how many worker processes and logical replication 
workers are currently running   I would like to find out how close I'm to the 
max limit.

max_logical_replication_workers
max_worker_processes

Thanks
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication - Different Primary Key on Source Table and Destination Table

2021-08-05 Thread Avi Weinberg
Hi

I use logical replication to copy data from many source tables to one 
destination table.  On destination table I would like to have primary key 
composed of the source site ID and row ID.  On the source table I only need the 
primary key to be the row ID since the site ID is always the same per site.  Is 
it permitted to have different primary key on source and destination tables?

Thanks
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Lazy View's Column Computing

2021-08-02 Thread Avi Weinberg
Hi,

Is there a way to compute a column in a view only if it is referenced in the 
query?  I have a view's column that its value is computed by a function.  If in 
the query that column is not used at all, can Postgres "skip" computing it?

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Recursive Parent-Child Function Bottom Up

2021-07-26 Thread Avi Weinberg
Hi,

I would like to populate the children_ids column with all the ids of the 
children recursively (+ grandchildren etc.)
If I do it top-bottom I will end up doing extra work since there is no need to 
go all levels down if I can just compute my IMMEDIATE children "children_ids" 
and just concatenate all their lists.

I looked at 
https://stackoverflow.com/questions/41376655/how-can-i-traverse-a-tree-bottom-up-to-calculate-a-weighted-average-of-node-va
But was not able to get it to work on my case.

Your assistance is most welcome!


create table tree(id int primary key, parent int, children_ids text);
insert into tree (id, parent) values
(273,   0),
(274,  273),
(275,  273),
(277,  273),
(278,  277),
(280,  275),
(281,  280),
(282,  281),
(283,  282),
(284,  282),
(285,  282),
(286,  282),
(287,  282),
(288,  282),
(289,  282),
(290,  281),
(291,  290),
(292,  290),
(293,  290),
(294,  290),
(295,  290);
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


View of Union Multiple Tables - Only If table EXISTS

2021-07-25 Thread Avi Weinberg
Hi,

I would like to create a view which is UNION of multiple tables with the same 
name from different schemas.

Select col1, col2 from schemaA.table
UNION
Select col1, col2 from schemaB.table

However, it is possible that in some of the schemas the table was not created 
yet.

I know I can check in which schemas the table exists and then create the view, 
but at any given time the table can be added to other schemas.  In that case, I 
will always need to check if the view is up-to-date before accessing it and 
this makes the view useless.

Is it possible to write the view syntax once and it will do some type of SELECT 
IF EXISTS.

Thanks
Avi



IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Logical Replication Configuration for 11 sites Bi-directional logical replication

2021-07-08 Thread Avi Weinberg
I have a setup with the following:

One (1) "main" site
Ten (10) "regional" sites

"main" site is a publisher to each of the regional sites (for 20 tables).
"main" site is also a subscriber from each of the 10 regional sites (5 tables).

In short: The main site has 1 publication and 10 subscriptions.  Each regional 
site has 1 publication and 1 subscription.  This is in addition to the 
streaming replication that each main and regional sites have for High 
Availability.

I currently have the following Postgres configuration for "main" site

max_wal_senders (25)
max_replication_slots (25)
max_logical_replication_workers (25)
max_worker_processes (25)

logical_decoding_work_mem (128MB)
max_slot_wal_keep_size 750MB

max_parallel_workers (8) default
max_parallel_maintenance_workers (2) default
max_parallel_workers_per_gather (2) default
max_sync_workers_per_subscription (2) default


I currently have the following Postgres configuration for "regional" site

max_wal_senders (10)
max_replication_slots (10)
max_logical_replication_workers (10)
max_worker_processes (10)

logical_decoding_work_mem (128MB)
max_slot_wal_keep_size 750MB

max_parallel_workers (8) default
max_parallel_maintenance_workers (2) default
max_parallel_workers_per_gather (2) default
max_sync_workers_per_subscription (2) default

Is this configuration sufficient and complete for a setup described above?
If I set higher number but do not use them (not all 10 regional sites defined) 
will it still consume resources?

I prefer to set higher numbers now so I will not need to restart Postgres when 
more "regional" setup are installed as long as it is not consuming to much 
resources if currently not used.

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


RE: Logical Replication - Single Destination Table With Multiple Source Tables - How to Handle Structure Changes

2021-07-05 Thread Avi Weinberg
Thanks for the reply,

My question was, what will happen if I have one destination table which gets 
data from many source tables.  What is the best way to handle changes in the 
structure of SOME of the source tables, while other source tables remain in the 
old format.
Maybe in some cases where the type was changed it may be able to work with 
source tables of different types, but what if column was renamed in one source 
table but the column remained with the old name in the other source table?  
What column name should the destination table have?  Do I need to duplicate the 
column to have both old and new names?

From: Vijaykumar Jain [mailto:vijaykumarjain.git...@gmail.com]
Sent: Sunday, July 4, 2021 6:53 PM
To: Avi Weinberg 
Cc: pgsql-general@lists.postgresql.org
Subject: Re: Logical Replication - Single Destination Table With Multiple 
Source Tables - How to Handle Structure Changes

On Sun, 4 Jul 2021 at 15:53, Avi Weinberg 
mailto:a...@gilat.com>> wrote:
I'm using logical replication to copy data from multiple tables to a single 
destination table.  At times the structure of the source table needs to change. 
 However, not all source table will have their structure updated at the same 
time.  Let's assume, for example, a column type needs to be changed (but 
solution needs to work for column addition, deletion, rename etc.).  What is 
the preferable approach:


  1.  To add another column to the destination table where its type will be the 
new type.  Source tables, that have the new column type, will write to the new 
column.  While source tables with old column type will write to the old column. 
 A view will do the proper casting and will show a single column to user.
  2.  Add a new table (versioned: table_name_v2) where source tables that have 
a new structure will write to the new destination table, while old source 
tables will write to the old destination table.   A view with UNION and casting 
will combine all tables.
  3.  A better way?

Does the below not work for all alter table changes on publisher.
I have been testing on a demo setup pg14beta, and subscribers are able to catch 
up fine.

on publisher (port 5001)
postgres=# alter table t alter COLUMN name type bigint using name::bigint;

on subscriber (port 5002)
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# alter subscription mysub refresh publication;

this should work for all the cases for ddl changes right.

-- demo

-- create table on pub and sub
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# create table t(id int primary key, name text);
CREATE TABLE
postgres=# \q

postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# create table t(id int primary key, name text);
CREATE TABLE
postgres=# \q

-- insert dummy data to check sub rx changes
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# insert into t values (1, 1::text);
INSERT 0 1
postgres=# \q

postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# table t;
 id | name
+--
(0 rows)
postgres=# alter subscription mysub refresh publication; -- this is because i 
dropped table with publication enabled
ALTER SUBSCRIPTION
postgres=# table t;
 id | name
+--
  1 | 1
(1 row)
postgres=# \q

-- alter table alter column change type on pub
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# \q

-- alter table alter column change type on sub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type bigint using name::bigint;
ALTER TABLE
postgres=# \q

-- insert new data based on new column type
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# insert into t values (2, 100);
INSERT 0 1
postgres=# \q

-- check new data on sub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.
postgres=# table t;
 id |  name
+-
  1 |   1
  2 | 100
(2 rows)
postgres=# \q


--alter table alter col type on pub and insert data
postgres@db:~/playground/logical_replication$ psql -p 5001
psql (14beta1)
Type "help" for help.
postgres=# alter table t alter COLUMN name type text using name::text;
ALTER TABLE
postgres=# insert into t values (3, 'three');
INSERT 0 1
postgres=# \q

--alter table alter col type on sub, changes will not come till refresh pub
postgres@db:~/playground/logical_replication$ psql -p 5002
psql (14beta1)
Type "help" for help.

Logical Replication - Single Destination Table With Multiple Source Tables - How to Handle Structure Changes

2021-07-04 Thread Avi Weinberg
I'm using logical replication to copy data from multiple tables to a single 
destination table.  At times the structure of the source table needs to change. 
 However, not all source table will have their structure updated at the same 
time.  Let's assume, for example, a column type needs to be changed (but 
solution needs to work for column addition, deletion, rename etc.).  What is 
the preferable approach:


  1.  To add another column to the destination table where its type will be the 
new type.  Source tables, that have the new column type, will write to the new 
column.  While source tables with old column type will write to the old column. 
 A view will do the proper casting and will show a single column to user.
  2.  Add a new table (versioned: table_name_v2) where source tables that have 
a new structure will write to the new destination table, while old source 
tables will write to the old destination table.   A view with UNION and casting 
will combine all tables.
  3.  A better way?





Thanks

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Must be superuser to create subscriptions - Any way I can avoid using superuser

2021-07-01 Thread Avi Weinberg
I would like to avoid using superuser to create subscriptions (logical 
replication).  If I do not use superuser, I get the following error message 
"must be superuser to create subscriptions".  Is there a way to void using 
superuser to create a subscription?  What is pg_subscription_users that I read 
being discussed?

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Is it Possible to Rename Replication Slot + Can Publisher be Paused

2021-06-21 Thread Avi Weinberg
Hi Experts,


  1.  Is it possible to rename replication slot?
  2.  Is it possible to pause publisher and resume if from the point it stopped 
after few minutes?  It know it is possible to do it with subscriber, but since 
I have many subscribers, I would like to do the pause to the publisher.

IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Notify When Streaming Replication Failover Occurred (slave promoted to master).

2021-06-13 Thread Avi Weinberg
I need to take actions when Postgres streaming replication failover
occurred.  Is there a way to be notified when Postgres slave becomes master?
If no such notification possible, what is the best way to actively monitor
which server is master.  My Postgres  is running inside Kubernetes and the
HA is managed by Patroni.

 

Thanks!

IMPORTANT - This email and any attachments is intended for the above named
addressee(s), and may contain information which is confidential or
privileged. If you are not the intended recipient, please inform the sender
immediately and delete this email: you should not copy or use this e-mail
for any purpose nor disclose its contents to any person. 


Notify When Streaming Replication Failover Occurred (slave promoted to master).

2021-06-13 Thread Avi Weinberg
I need to take actions when Postgres streaming replication failover occurred.  
Is there a way to be notified when Postgres slave becomes master?  If no such 
notification possible, what is the best way to actively monitor which server is 
master.  My Postgres  is running inside Kubernetes and the HA is managed by 
Patroni.

Thanks!
IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.


Implicit table removal from logical replication publication

2021-06-10 Thread Avi Weinberg
Hi Experts

I had a case where a  team member deleted and recreated an empty table which 
participated in logical replication.  After that action the table no longer was 
part of the replication and new inserts were not copied to the subscribers.

How can I check existing publication for the list of participating tables so I 
will know (from the publication side) that a table was removed?

It is possible for table to be removed from publication using the following 
steps:

create a publication with 2 tables
create a subscription to that publication
verify you have two lines for the following query on subscriber side "select * 
from pg_subscription_rel"
delete one of the tables from the publisher side
run alter subscription the_name refresh publication
run select * from pg_subscription_rel and see that only one row remain
even if you now add the deleted table on the publisher side, it will no longer 
participate in the publication until you add it explicitly again using "alter 
publication add table"

How can I know that the table was removed from publication so I will know to 
add it?

Thanks!



IMPORTANT - This email and any attachments is intended for the above named 
addressee(s), and may contain information which is confidential or privileged. 
If you are not the intended recipient, please inform the sender immediately and 
delete this email: you should not copy or use this e-mail for any purpose nor 
disclose its contents to any person.