Re: Transaction timeout
Hi, I read the V6 patch and found something needs to be improved. Prepared transactions should also be documented. A value of zero (the default) disables the timeout. +This timeout is not applied to prepared transactions. Only transactions +with user connections are affected. Missing 'time'. - gettext_noop("Sets the maximum allowed in a transaction."), + gettext_noop("Sets the maximum allowed time in a transaction."), 16 is already released. It's 17 now. - if (AH->remoteVersion >= 16) + if (AH->remoteVersion >= 17) ExecuteSqlStatement(AH, "SET transaction_timeout = 0"); And I test the V6 patch and it works as expected. -- Yuhang Qiu
Re: Transaction timeout
I test the V4 patch and found the backend does't process SIGINT while it's in secure_read. And it seems not a good choice to report ERROR during secure_read, which will turns into FATAL "terminating connection because protocol synchronization was lost". It might be much easier to terminate the backend rather than cancel the backend just like idle_in_transaction_session_timeout and idle_session_timeout did. But the name of the GUC might be transaction_session_timeout. And what about 2PC transaction? The hanging 2PC transaction also hurts server a lot. It’s active transaction but not active backend. Can we cancel the 2PC transaction and how we cancel it. -- Yuhang Qiu
Re: Simplify xlogreader.c with XLogRec* macros
> @@ -2036,8 +2035,8 @@ RestoreBlockImage(XLogReaderState *record, uint8 > block_id, char *page) > char*ptr; > PGAlignedBlock tmp; > > - if (block_id > record->record->max_block_id || > - !record->record->blocks[block_id].in_use) > + if (block_id > XLogRecMaxBlockId(record) || > + !XLogRecGetBlock(record, block_id)->in_use) > > I thought these can also be rewrite to: > > if (!XLogRecHasBlockRef(record, block_id)) Oops, I missed that. New version is attached. -- Yuhang Qiu v2-0001-Simplify-xlogreader.c-with-XLogRec-macros.patch Description: Binary data
Simplify xlogreader.c with XLogRec* macros
Hello hackers, Commit 3f1ce97 refactored XLog record access macros, but missed in a few places. I fixed this, and patch is attached. -- Yuhang Qiu 0001-Simplify-xlogreader.c-with-XLogRec-macros.patch Description: Binary data
Re: Some performance degradation in REL_16 vs REL_15
I wrote a script and test on branch REL_[10-16]_STABLE, and do see performance drop in REL_13_STABLE, which is about 1~2%. scale round 10 11 12 13 14 15 16 1 1 7922.2 8018.3 8102.8 7838.3 7829.2 7870.0 7846.1 2 7922.4 7923.5 8090.3 7887.7 7912.4 7815.2 7865.6 3 7937.6 7964.9 8012.8 7918.5 7879.4 7786.4 7981.1 4 8000.4 7959.5 8141.1 7886.3 7840.9 7863.5 8022.4 5 7921.8 7945.5 8005.2 7993.7 7957.0 7803.8 7899.8 6 7893.8 7895.1 8017.2 7879.8 7880.9 7911.4 7909.2 7 7879.3 7853.5 8071.7 7956.2 7876.7 7863.3 7986.3 8 7980.5 7964.1 8119.2 8015.2 7877.6 7784.9 7923.6 9 8083.9 7946.4 7960.3 7913.9 7924.6 7867.7 7928.6 10 7971.2 7991.8 7999.5 7812.4 7824.3 7831.0 7953.4 AVG 7951.3 7946.3 8052.0 7910.2 7880.3 7839.7 7931.6 MED 7930.0 7952.9 8044.5 7900.8 7878.5 7847.1 7926.1 10 1 41221.5 41394.8 40926.8 40566.6 41661.3 40511.9 40961.8 2 40974.0 40697.9 40842.4 40269.2 41127.7 40795.5 40814.9 3 41453.5 41426.4 41066.2 40890.9 41018.6 40897.3 40891.7 4 41691.9 40294.9 41189.8 40873.8 41539.7 40943.2 40643.8 5 40843.4 40855.5 41243.8 40351.3 40863.2 40839.6 40795.5 6 40969.3 40897.9 41380.8 40734.7 41269.3 41301.0 41061.0 7 40981.1 41119.5 41158.0 40834.6 40967.1 40790.6 41061.6 8 41006.4 41205.9 40740.3 40978.7 40742.4 40951.6 41242.1 9 41089.9 41129.7 40648.3 40622.1 40782.0 40460.5 40877.9 10 41280.3 41462.7 41316.4 40728.0 40983.9 40747.0 40964.6 AVG 41151.1 41048.5 41051.3 40685.0 41095.5 40823.8 40931.5 MED 41048.2 41124.6 41112.1 40731.3 41001.3 40817.6 40926.7 100 1 43429.0 43190.2 44099.3 43941.5 43883.3 44215.0 44604.9 2 43281.7 43795.2 44963.6 44331.5 43559.7 43571.5 43403.9 3 43749.0 43614.1 44616.7 43759.5 43617.8 43530.3 43362.4 4 43362.0 43197.3 44296.7 43692.4 42020.5 43607.3 43081.8 5 43373.4 43288.0 44240.9 43795.0 43630.6 43576.7 43512.0 6 43637.0 43385.2 45130.1 43792.5 43635.4 43905.2 43371.2 7 43621.2 43474.2 43735.0 43592.2 43889.7 43947.7 43369.8 8 43351.0 43937.5 44285.6 43877.2 43771.1 43879.1 43680.4 9 43481.3 43700.5 44119.9 43786.9 43440.8 44083.1 43563.2 10 43238.7 43559.5 44310.8 43406.0 44306.6 43376.3 43242.7 AVG 43452.4 43514.2 44379.9 43797.5 43575.6 43769.2 43519.2 MED 43401.2 43516.8 44291.2 43789.7 43633.0 43743.2 43387.5 The script looks like: initdb data >/dev/null 2>&1 #initdb on every round pg_ctl -D data -l logfile start >/dev/null 2>&1 #start without changing any setting pgbench -i postgres $scale >/dev/null 2>&1 sleep 1 >/dev/null 2>&1 pgbench -c20 -T10 -j8 And here is the pg_config output: ... CONFIGURE = '--enable-debug' '--prefix=/home/postgres/base' '--enable-depend' 'PKG_CONFIG_PATH=/usr/local/lib64/pkgconfig::/usr/lib/pkgconfig' CC = gcc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 CFLAGS_SL = -fPIC LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/home/postgres/base/lib',--enable-new-dtags LDFLAGS_EX = LDFLAGS_SL = LIBS = -lpgcommon -lpgport -lz -lreadline -lpthread -lrt -ldl -lm VERSION = PostgreSQL 16.0 —- Yuhang Qiu
回复:Attach to shared memory after fork()
> Windows has CreateProcess, which isn't available elsewhere. Yes, we still need fork() on *nix. So the solution is to reduce the overhead of fork(). Attach to shared memory after fork() might be a "Better shared memory management". > This is one of the reasons for using a connection pooler like pgbouncer, > which can vastly reduce the number of new process creations Postgres has to do. Yes, it’s another way I forgot to mention. But I think there should be a cleaner way without other component. > This proposal seems moderately insane. In the first place, it > introduces failure modes we could do without, and in the second place, > how is it not strictly *more* expensive than what happens now? You > still have to end up with all those TLB entries mapped in the child. Yes, the idea is radical. But it’s practical. 1. I don’t quite catch that. Can you explain it? 2. Yes, the overall cost is still the same, but the cost can spread into multi processes thus CPUs, not 100% on Postmaster. > (If your kernel is unable to pass down shared-memory TLBs effectively, > ISTM that's a kernel shortcoming not a Postgres architectural problem.) Indeed, it’s a kernel/CPUarch shortcoming. But it is also a Postgres architectural problem. MySQL and Oracle have no such problem. IMHO Postgres should manage itself well(eg. IO/buffer pool/connection/...) and not rely so much on OS kernel... The fork() used to be a genius hack, but now it’s a burden and it will get worse and worse. All I want to do is remove fork() or reduce the overhead. Maybe *nux will have CreateProcess someday(and I think it will), and we should wait for it?
Attach to shared memory after fork()
Fork is an expensive operation[1]. The major cost is the mm(VMA PTE...) copy. ARM is especially weak on fork, which will invalid TLB entries one by one, and this is an expensive operation[2]. We could easily got 100% CPU on ARM machine. We also meet fork problem in x86, but not as serious as arm. We can avoid this by enable hugepage(and 2MB doesn’t help us under arm, we got a huge shared buffer), but we still think it is a problem. So I propose to remove shared buffers from postmaster and shmat them after fork. Not all of them, we still keep necessary shared memories in postmaster. Or maybe we just need to give up fork like under Windows? Any good idea about it? [1]. https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/ [2]. https://developer.arm.com/documentation/ddi0487/latest/ D5.10 TLB maintenance requirements and the TLB maintenance instructions: Break-before-make sequence on changing from an old translation table entry to a new translation table entryrequires the following steps: 1. Replace the old translation table entry with an invalid entry, and execute a DSB instruction. 2. Invalidate the translation table entry with a broadcast TLB invalidation instruction, and execute a DSBinstruction to ensure the completion of that invalidation. 3. Write the new translation table entry, and execute a DSB instruction to ensure that the new entry is visible. Regards. Yuhang Qiu.
Re: Optimization for hot standby XLOG_STANDBY_LOCK redo
And one more question, what LogAccessExclusiveLocks in LogStandbySnapshot is used for? Can We remove this. > 2020年5月6日 上午10:36,邱宇航 写道: > > I mean that all resources protected by XLOG_STANDBY_LOCK should redo later. > The semantics of XLOG_STANDBY_LOCK is still kept. >
Re: Optimization for hot standby XLOG_STANDBY_LOCK redo
I mean that all resources protected by XLOG_STANDBY_LOCK should redo later. The semantics of XLOG_STANDBY_LOCK is still kept. > 2020年4月30日 下午7:12,Amit Kapila 写道: > > On Thu, Apr 30, 2020 at 4:07 PM 邱宇航 wrote: >> >> I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by >> another query, and all the rest redo is blocked by this lock getting >> operation, which is not good and often happed in my database, so the hot >> standby will be left behind and master will store a lot of WAL which can’t >> be purged. >> >> So here is the idea: >> We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will >> continue. >> > > Hmm, I don't think we can do this. The XLOG_STANDBY_LOCK WAL is used > for AccessExclusiveLock on a Relation which means it is a lock for a > DDL operation. If you skip processing the WAL for this lock, the > behavior of queries running on standby will be unpredictable. > Consider a case where on the master, the user has dropped the table > and when it will replay such an operation on standby the > concurrent queries on t1 will be blocked due to replay of > XLOG_STANDBY_LOCK WAL and if you skip that WAL, the drop of table and > query on the same table can happen simultaneously leading to > unpredictable behavior. > > -- > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com
Optimization for hot standby XLOG_STANDBY_LOCK redo
I noticed that in hot standby, XLOG_STANDBY_LOCK redo is sometimes block by another query, and all the rest redo is blocked by this lock getting operation, which is not good and often happed in my database, so the hot standby will be left behind and master will store a lot of WAL which can’t be purged. So here is the idea: We can do XLOG_STANDBY_LOCK redo asynchronously, and the rest redo will continue. And I wonder will LogStandbySnapshot influence the consistency in hot standby, for the redo is not by order. And how to avoid this. // -- startup -- StartupXLOG() { while (readRecord()) { check_lock_get_state(); if (record.tx is in pending tbl): append this record to the pending lock for further redo. redo_record(); } } check_lock_get_state() { for (tx in pending_tx): if (tx.all_lock are got): redo the rest record for this tx free this tx } standby_redo { if (XLOG_STANDBY_LOCK redo falied) { add_lock_to_pending_tx_tbl(); } } // -- worker process -- main() { while(true) { for (lock in pending locks order by lsn) try_to_get_lock_from_pending_tbl(); } } regards. Yuhang