On Fri, Jan 6, 2023 at 9:25 AM houzj.f...@fujitsu.com <houzj.f...@fujitsu.com> wrote: > > Hi, > > When developing another feature, I find an existing bug which was reported > from Dilip[1]. > > Currently, it's possible that we only send a streaming block without sending a > end of stream message(stream abort) when decoding and streaming a transaction > that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT > for such a crashed transaction. This will cause the subscriber(with > streaming=on) create a stream file but won't delete it until the apply > worker restart. > > BUG repro(borrowed from Dilip): > --- > 1. start 2 servers(config: logical_decoding_work_mem=64kB) > ./pg_ctl -D data/ -c -l pub_logs start > ./pg_ctl -D data1/ -c -l sub_logs start > > 2. Publisher: > create table t(a int PRIMARY KEY ,b text); > create publication test_pub for table t > with(PUBLISH='insert,delete,update,truncate'); > alter table t replica identity FULL ; > > 3. Subscription Server: > create table t(a int,b text); > create subscription test_sub CONNECTION 'host=localhost port=10000 > dbname=postgres' PUBLICATION test_pub WITH ( slot_name = > test_slot_sub1,streaming=on); > > 4. Publication Server: > begin ; > insert into t values (generate_series(1,50000), 'zzzzzzzzz'); -- (while > executing this restart publisher in 2-3 secs) > > Restart the publication server, while the transaction is still in an > uncommitted state. > ./pg_ctl -D data/ -c -l pub_logs restart -mi > --- > > After restarting the publisher, we can see the subscriber receive a streaming > block and create a stream file(/base/pgsql_tmp/xxx.fileset). > > To fix it, One idea is to send a stream abort message when we are cleaning up > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is > a tiny patch which changes the same. I have confirmed that the bug is fixed > and > all regression tests pass. I didn't add a testcase because we need to make > sure > the crash happens before all the WAL logged transactions data are decoded > which > doesn't seem easy to write a stable test for this. > > Thoughts ?
Fix looks good to me. Thanks for working on this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com