Hello. At Sun, 2 Dec 2018 16:41:06 -0800, Noah Misch <n...@leadboat.com> wrote in <20181203004106.ga2860...@rfd.leadboat.com> > On Thu, Nov 29, 2018 at 10:51:40PM -0800, Noah Misch wrote: > > On Thu, Nov 29, 2018 at 01:10:57PM +0100, Dmitry Dolgov wrote: > > > As a side note, with this patch recovery tests are failing now on > > > 016_shm.pl > > > > > > # Failed test 'detected live backend via shared memory' > > > # at t/016_shm.pl line 87. > > > # '2018-11-28 13:08:08.504 UTC [21924] LOG: > > > listening on Unix socket "/tmp/yV2oDNcG8e/gnat/.s.PGSQL.512" > > > # 2018-11-28 13:08:08.512 UTC [21925] LOG: database system was > > > interrupted; last known up at 2018-11-28 13:08:08 UTC > > > # 2018-11-28 13:08:08.512 UTC [21925] LOG: database system was not > > > properly shut down; automatic recovery in progress > > > # 2018-11-28 13:08:08.512 UTC [21925] LOG: invalid record length at > > > 0/165FEF8: wanted 24, got 0 > > > # 2018-11-28 13:08:08.512 UTC [21925] LOG: redo is not required > > > # 2018-11-28 13:08:08.516 UTC [21924] LOG: database system is ready > > > to accept connections > > > # ' > > > # doesn't match '(?^:pre-existing shared memory block)' > > > > Thanks for the report. Since commit cfdf4dc made pg_sleep() react to > > postmaster death, the test will need a different way to stall a backend. > > This > > doesn't affect non-test code, and the test still passes against cfdf4dc^ and > > against REL_11_STABLE. I've queued a task to update the test code, but > > review > > can proceed in parallel.
I found that I have 65(h) segments left alone on my environment:p At Sat, 11 Aug 2018 23:48:15 -0700, Noah Misch <n...@leadboat.com> wrote in <20180812064815.gb2301...@rfd.leadboat.com> > still doesn't detect. I could pursue a fix via the aforementioned > sysv_shmem_key file, modulo the possibility of a DBA removing it. I could > also, when postmaster.pid is missing, make sysv_shmem.c check the first N > (N=100?) keys applicable to the selected port. My gut feeling is that neither > thing is worthwhile, but I'm interested to hear other opinions. # Though I don't get the meaning of the "modulo" there.. I think the only thing we must avoid here is sharing the same shmem segment with a living-dead server. If we can do that without the pid file, it would be better than relying on it. We could remove orphaned segments automatically, but I don't think we should do that going so far as relying on a dedicated file. Also, I don't think it's worth stopping shmem id scanning at a certain number since I don't come up with an appropriate number for it. But looking "port * 1000", it might be expected that a usable segment id will found while scanning that number of ids (1000). > Here's a version fixing that test for post-cfdf4dc backends. This moves what were in PGSharedMmoeryIsInUse into a new function IpcMemoryAnalyze for reusing then adds distinction among EEXISTS/FOREIGN in not-in-use cases and ATTACHED/ANALYSIS_FAILURE in an in-use case. UNATTACHED is changed to a not-in-use case by this patch. As the result PGSharedMemoryIsInUse() is changed so that it reutrns "not-in-use" for the UNATTACHED case. It looks fine. PGSharedMemoryCreate changed to choose a usable shmem id using the IpcMemoryAnalyze(). But some of the statuses from IpcMemoryAnalyze() is concealed by failure of PGSharedMemoryAttach() and ignored silently opposed to what the code is intending to do. (By the way SHMSTATE_EEXISTS seems suggesting oppsite thing from EEXIST, which would be confusing.) PGSharedMemoryCreate() repeats shmat/shmdt twice in every iteration. It won't harm so much but it would be better if we could get rid of that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center