Re: backends stuck in "startup"
On Sat, Nov 25, 2017 at 05:45:59PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > We never had any issue during the ~2 years running PG96 on this VM, until > > upgrading Monday to PG10.1, and we've now hit it 5+ times. > > > BTW this is a VM run on a hypervisor managed by our customer: > > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > > BIOS 6.00 06/22/2012 > > > Linux TS-DB 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 > > x86_64 x86_64 x86_64 GNU/Linux > > Actually ... I was focusing on the wrong part of that. It's not > your hypervisor, it's your kernel. Running four-year-old kernels > is seldom a great idea, and in this case, the one you're using > contains the well-reported missed-futex-wakeups bug: > > https://bugs.centos.org/view.php?id=8371 > > While rebuilding PG so it doesn't use POSIX semaphores will dodge > that bug, I think a kernel update would be a far better idea. > There are lots of other known bugs in that version. > > Relevant to our discussion, the fix involves inserting a memory > barrier into the kernel's futex call handling: Ouch ! Thanks for the heads up and sorry for the noise. I'm still trying to coax 3 customers off centos5.x, so the 2 customers left running centos6.5 weren't on any of my mental lists.. Justin
Re: backends stuck in "startup"
Justin Pryzby writes: > We never had any issue during the ~2 years running PG96 on this VM, until > upgrading Monday to PG10.1, and we've now hit it 5+ times. > BTW this is a VM run on a hypervisor managed by our customer: > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > BIOS 6.00 06/22/2012 > Linux TS-DB 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 > x86_64 x86_64 GNU/Linux Actually ... I was focusing on the wrong part of that. It's not your hypervisor, it's your kernel. Running four-year-old kernels is seldom a great idea, and in this case, the one you're using contains the well-reported missed-futex-wakeups bug: https://bugs.centos.org/view.php?id=8371 While rebuilding PG so it doesn't use POSIX semaphores will dodge that bug, I think a kernel update would be a far better idea. There are lots of other known bugs in that version. Relevant to our discussion, the fix involves inserting a memory barrier into the kernel's futex call handling: https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0 regards, tom lane
Re: backends stuck in "startup"
Justin Pryzby writes: > On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote: >> My hypothesis about a missed memory barrier would imply that there's (at >> least) one process that's waiting but is not in the lock's wait queue and > Do I have to also check the wait queue to verify? Give a hint/pointer please? Andres probably knows more about this data structure than I do, but I believe that the values in the LWLock's proclist_head field are indexes into the PGProc array, and that the PGProc.lwWaitLink proclist_node fields contain the fore and aft pointers in a doubly-linked list of waiting processes. But chasing through that by hand is going to be darn tedious if there are a bunch of processes queued for the same lock. In any case, if the process is blocked right there and its lwWaiting field is not set, that is sufficient proof of a bug IMO. What is not quite proven yet is why it failed to detect that it'd been woken. I think really the most useful thing at this point is just to wait and see if your SYSV-semaphore build exhibits the same problem or not. If it does not, we can be pretty confident that *something* is wrong with the POSIX-semaphore code, even if my current theory isn't it. >> My theory suggests that any contended use of an LWLock is at risk, >> in which case just running pgbench with about as many sessions as >> you have in the live server ought to be able to trigger it. However, >> that doesn't really account for your having observed the problem >> only during session startup, > Remember, this issue breaks existing sessions, too. Well, once one session is hung up, anything else that came along wanting access to that same LWLock would also get stuck. Since the lock in question is a buffer partition lock controlling access to something like 1/128'th of the shared buffer pool, it would not take too long for every active session to get stuck there, whether it were doing anything related or not. In any case, if you feel like trying the pgbench approach, I'd suggest setting up a script to run a lot of relatively short runs rather than one long one. If there is something magic about the first blockage in a session, that would help catch it. > Am I right this won't help for lwlocks? ALTER SYSTEM SET log_lock_waits=yes Nope, that's just for heavyweight locks. LWLocks are lightweight precisely because they don't have stuff like logging, timeouts, or deadlock detection ... regards, tom lane
Re: backends stuck in "startup"
On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > For starters, I found that PID 27427 has: > > > (gdb) p proc->lwWaiting > > $1 = 0 '\000' > > (gdb) p proc->lwWaitMode > > $2 = 1 '\001' > > To confirm, this is LWLockAcquire's "proc", equal to MyProc? > If so, and if LWLockAcquire is blocked at PGSemaphoreLock, > that sure seems like a smoking gun. Right: (gdb) bt #0 0x003f6ee0d930 in ?? () #1 0x7f19f3400d38 in ?? () #2 0x006a4440 in PGSemaphoreLock (sema=0x7f19f3400d38) at pg_sema.c:310 #3 0x00714495 in LWLockAcquire (lock=0x7f19f3404f80, mode=LW_SHARED) at lwlock.c:1233 #4 0x006f75e5 in BufferAlloc (smgr=0x1f3c368, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff6ef1c5af "") at bufmgr.c:1012 #5 ReadBuffer_common (smgr=0x1f3c368, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff6ef1c5af "") at bufmgr.c:745 [...] (gdb) up #1 0x7f19f3400d38 in ?? () (gdb) up #2 0x006a4440 in PGSemaphoreLock (sema=0x7f19f3400d38) at pg_sema.c:310 310 errStatus = sem_wait(PG_SEM_REF(sema)); (gdb) up #3 0x00714495 in LWLockAcquire (lock=0x7f19f3404f80, mode=LW_SHARED) at lwlock.c:1233 1233PGSemaphoreLock(proc->sem); (gdb) p proc->lwWaiting $1 = 0 '\000' (gdb) p proc->lwWaitMode $2 = 1 '\001' (gdb) p proc $3 = (PGPROC *) 0x7f1a77dba500 (gdb) p MyProc $4 = (PGPROC *) 0x7f1a77dba500 (gdb) p MyProc==proc $5 = 1 I suppose one needs to show that this backend was stuck and not "just happened to start 1ms before gdb connected", which I actually have: postgres 27427 17665 0 16:57 ?00:00:00 postgres: indigo indigo 207.241.169.248(37226) startup ... postgres 30298 17665 0 17:01 ?00:00:00 postgres: main main 207.241.169.100(43088) startup ..so that process was in "startup" 3-4 minutes after being forked. > My hypothesis about a missed memory barrier would imply that there's (at > least) one process that's waiting but is not in the lock's wait queue and Do I have to also check the wait queue to verify? Give a hint/pointer please? > > Would you suggest how I can maximize the likelyhood/speed of triggering > > that ? > > Five years ago, with a report of similar symptoms, you said "You need to > > hack > > pgbench to suppress the single initialization connection it normally likes > > to > > make, else the test degenerates to the one-incoming-connection case" > > https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us > > I don't think that case was related at all. > > My theory suggests that any contended use of an LWLock is at risk, > in which case just running pgbench with about as many sessions as > you have in the live server ought to be able to trigger it. However, > that doesn't really account for your having observed the problem > only during session startup, Remember, this issue breaks existing sessions, too. After it'd happened a couple times, I made sure to leave a session opened to allow collecting diagnostics (and try to un-stick it), but when it recurred, was unable to even SELECT * FROM pg_locks. Am I right this won't help for lwlocks? ALTER SYSTEM SET log_lock_waits=yes Justin
Re: backends stuck in "startup"
Justin Pryzby writes: > For starters, I found that PID 27427 has: > (gdb) p proc->lwWaiting > $1 = 0 '\000' > (gdb) p proc->lwWaitMode > $2 = 1 '\001' To confirm, this is LWLockAcquire's "proc", equal to MyProc? If so, and if LWLockAcquire is blocked at PGSemaphoreLock, that sure seems like a smoking gun. > Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the > service up (and to the degree that serves to verify that avoids the issue, > great). Good idea, I was going to suggest that. It will be very interesting to see if that makes the problem go away. > Would you suggest how I can maximize the likelyhood/speed of triggering that ? > Five years ago, with a report of similar symptoms, you said "You need to hack > pgbench to suppress the single initialization connection it normally likes to > make, else the test degenerates to the one-incoming-connection case" > https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us I don't think that case was related at all. My theory suggests that any contended use of an LWLock is at risk, in which case just running pgbench with about as many sessions as you have in the live server ought to be able to trigger it. However, that doesn't really account for your having observed the problem only during session startup, so there may be some other factor involved. I wonder if it only happens during the first wait for an LWLock ... and if so, how could that be? regards, tom lane
Re: backends stuck in "startup"
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote: > Justin Pryzby writes: > > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > >> Could you try stracing next time? > > > I straced all the "startup" PIDs, which were all in futex, without > > exception: > > If you've got debug symbols installed, could you investigate the states > of the LWLocks the processes are stuck on? > > My hypothesis about a missed memory barrier would imply that there's (at > least) one process that's waiting but is not in the lock's wait queue and > has MyProc->lwWaiting == false, while the rest are in the wait queue and > have MyProc->lwWaiting == true. Actually chasing through the list > pointers would be slightly tedious, but checking MyProc->lwWaiting, > and maybe MyProc->lwWaitMode, in each process shouldn't be too hard. > Also verify that they're all waiting for the same LWLock (by address). I believe my ~40 cores are actually for backends from two separate instances of this issue on the VM, as evidenced by different argv pointers. And for each instance, I have cores for only a fraction of the backends (max_connections=400). For starters, I found that PID 27427 has: (gdb) p proc->lwWaiting $1 = 0 '\000' (gdb) p proc->lwWaitMode $2 = 1 '\001' ..where all the others have lwWaiting=1 For #27427: (gdb) p *lock $27 = {tranche = 59, state = {value = 1627389952}, waiters = {head = 147, tail = 308}} (gdb) info locals mustwait = 1 '\001' proc = 0x7f1a77dba500 result = 1 '\001' extraWaits = 0 __func__ = "LWLockAcquire" And at this point I have to ask for help how to finish traversing these structures. I could upload cores for someone (I don't think there's anything too private) but so far I have 16GB gz compressed cores. Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the service up (and to the degree that serves to verify that avoids the issue, great). But I could start an instance running pgbench to try to trigger on this VM, with smaller shared_buffers and backends/clients to allow full cores of every backend (I don't think I'll be able to dump all 400 cores each up to 2GB from the production instance). Would you suggest how I can maximize the likelyhood/speed of triggering that ? Five years ago, with a report of similar symptoms, you said "You need to hack pgbench to suppress the single initialization connection it normally likes to make, else the test degenerates to the one-incoming-connection case" https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us ..but, pgbench --connect seems to do what's needed(?) (I see that dates back to 2001, having been added at ba708ea3). (I don't know there's any suggestion or reason to be believe the bug is specific to connection/startup phase, or that it's a necessary or sufficient to hit the bug, but it's at least known to be impacted and all I have to go on for now). Justin
Re: backends stuck in "startup"
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote: > Justin Pryzby writes: > [ in an earlier post: ] > > BTW this is a VM run on a hypervisor managed by our customer: > > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > > BIOS 6.00 06/22/2012 > > Hmm. Can't avoid the suspicion that that's relevant somehow. In case it helps or relevant, I'm told this is: Vsphere version 6.0 u3. Justin
Re: backends stuck in "startup"
Justin Pryzby writes: > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: >> Could you try stracing next time? > I straced all the "startup" PIDs, which were all in futex, without exception: If you've got debug symbols installed, could you investigate the states of the LWLocks the processes are stuck on? My hypothesis about a missed memory barrier would imply that there's (at least) one process that's waiting but is not in the lock's wait queue and has MyProc->lwWaiting == false, while the rest are in the wait queue and have MyProc->lwWaiting == true. Actually chasing through the list pointers would be slightly tedious, but checking MyProc->lwWaiting, and maybe MyProc->lwWaitMode, in each process shouldn't be too hard. Also verify that they're all waiting for the same LWLock (by address). I recognize Andres' point that on x86 lock-prefixed instructions should be full memory barriers, and at least on my Linux machines, there do seem to be lock-prefixed instructions in the fast paths through sem_wait and sem_post. But the theory fits the reported evidence awfully well, and we have no other theory that fits at all. [ in an earlier post: ] > BTW this is a VM run on a hypervisor managed by our customer: > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, > BIOS 6.00 06/22/2012 Hmm. Can't avoid the suspicion that that's relevant somehow. regards, tom lane
Re: backends stuck in "startup"
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > Hi, > > On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > > I'm sorry to report this previously reported problem is happening again, > > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > > > As $subject: backends are stuck in startup for minutes at a time. I didn't > > strace this time, but I believe last time I saw one was waiting in a futex. > > > - I'm planning do disable sync seqscan for the night (although as I > > understand > >that may not matter (?)) > > I doubt it matters here. The system still got wedged/stuck after a few hours even with: synchronize_seqscans=off; and use of pg_buffercache disabled (but extension not actually dropped), and our maintenance (vacuum and reindex) scripts disabled (these normally only run at night, so at least REINDEX script wasn't involved with several earlier incidents of this issue; but, I was still running the vacuum ANALYZE script post-upgrade). Justin
Re: backends stuck in "startup"
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: > Hi, > > On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > > I'm sorry to report this previously reported problem is happening again, > > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > > > As $subject: backends are stuck in startup for minutes at a time. I didn't > > strace this time, but I believe last time I saw one was waiting in a futex. > > Could you try stracing next time? I straced all the "startup" PIDs, which were all in futex, without exception: [pryzbyj@TS-DB ~]$ sudo strace `ps h --sort start_time -o pid,args -u postgres |awk '/startup$/{print "-p"$1}'` 2>&1 |tee -a /tmp/strace-pg-backends Process 3611 attached Process 6081 attached [pid 6081] futex(0x7ff4314000f8, FUTEX_WAIT, 0, NULL [pid 3611] futex(0x7ff431400138, FUTEX_WAIT, 0, NULL [pid 3354] futex(0x7ff431400258, FUTEX_WAIT, 0, NULL [pid 3117] futex(0x7ff4314002b8, FUTEX_WAIT, 0, NULL [pid 2948] futex(0x7ff431400378, FUTEX_WAIT, 0, NULL [pid 2925] futex(0x7ff4314003d8, FUTEX_WAIT, 0, NULL [pid 2789] futex(0x7ff431400438, FUTEX_WAIT, 0, NULL [pid 2752] futex(0x7ff4314004b8, FUTEX_WAIT, 0, NULL [pid 2663] futex(0x7ff431400618, FUTEX_WAIT, 0, NULL [pid 2661] futex(0x7ff431400638, FUTEX_WAIT, 0, NULL [pid 2624] futex(0x7ff431400678, FUTEX_WAIT, 0, NULL [pid 2576] futex(0x7ff4314006f8, FUTEX_WAIT, 0, NULL [pid 365] futex(0x7ff431400718, FUTEX_WAIT, 0, NULL [pid 326] futex(0x7ff431400798, FUTEX_WAIT, 0, NULL [pid 32704] futex(0x7ff431400898, FUTEX_WAIT, 0, NULL [pid 32309] futex(0x7ff431400938, FUTEX_WAIT, 0, NULL [pid 32292] futex(0x7ff431400a18, FUTEX_WAIT, 0, NULL [pid 32170] futex(0x7ff431400ad8, FUTEX_WAIT, 0, NULL [pid 32134] futex(0x7ff431400b58, FUTEX_WAIT, 0, NULL [pid 31987] futex(0x7ff431400bf8, FUTEX_WAIT, 0, NULL [pid 31962] futex(0x7ff431400c58, FUTEX_WAIT, 0, NULL [pid 31817] futex(0x7ff431400d38, FUTEX_WAIT, 0, NULL [pid 31807] futex(0x7ff431400d98, FUTEX_WAIT, 0, NULL [pid 29664] futex(0x7ff431400e98, FUTEX_WAIT, 0, NULL [pid 29462] futex(0x7ff431400ef8, FUTEX_WAIT, 0, NULL [pid 29426] futex(0x7ff431400f38, FUTEX_WAIT, 0, NULL [pid 29258] futex(0x7ff431401058, FUTEX_WAIT, 0, NULL [pid 29227] futex(0x7ff431402f58, FUTEX_WAIT, 0, NULL [pid 29107] futex(0x7ff431401138, FUTEX_WAIT, 0, NULL [pid 29071] futex(0x7ff431401198, FUTEX_WAIT, 0, NULL [pid 28942] futex(0x7ff431401258, FUTEX_WAIT, 0, NULL [pid 28924] futex(0x7ff431401298, FUTEX_WAIT, 0, NULL [pid 26942] futex(0x7ff431401418, FUTEX_WAIT, 0, NULL [pid 25278] futex(0x7ff431401438, FUTEX_WAIT, 0, NULL [pid 24909] futex(0x7ff431401598, FUTEX_WAIT, 0, NULL [pid 24908] futex(0x7ff4314015b8, FUTEX_WAIT, 0, NULL [pid 24852] futex(0x7ff4314015f8, FUTEX_WAIT, 0, NULL [pid 24851] futex(0x7ff431401618, FUTEX_WAIT, 0, NULL [pid 24585] futex(0x7ff4314016f8, FUTEX_WAIT, 0, NULL [pid 24584] futex(0x7ff431401718, FUTEX_WAIT, 0, NULL [pid 24478] futex(0x7ff431401918, FUTEX_WAIT, 0, NULL [pid 24476] futex(0x7ff431401958, FUTEX_WAIT, 0, NULL [pid 24470] futex(0x7ff4314019d8, FUTEX_WAIT, 0, NULL [pid 24464] futex(0x7ff431401ad8, FUTEX_WAIT, 0, NULL [pid 24458] futex(0x7ff431401bb8, FUTEX_WAIT, 0, NULL [pid 24456] futex(0x7ff431401b98, FUTEX_WAIT, 0, NULL [pid 24450] futex(0x7ff431401c58, FUTEX_WAIT, 0, NULL [pid 24448] futex(0x7ff431401c98, FUTEX_WAIT, 0, NULL [pid 24439] futex(0x7ff431401cf8, FUTEX_WAIT, 0, NULL [pid 24323] futex(0x7ff431401e38, FUTEX_WAIT, 0, NULL [pid 24322] futex(0x7ff431401e58, FUTEX_WAIT, 0, NULL [pid 23057] futex(0x7ff431401fd8, FUTEX_WAIT, 0, NULL [pid 23055] futex(0x7ff431402038, FUTEX_WAIT, 0, NULL [pid 21826] futex(0x7ff4314021f8, FUTEX_WAIT, 0, NULL [pid 21824] futex(0x7ff431402238, FUTEX_WAIT, 0, NULL [pid 21794] futex(0x7ff431402298, FUTEX_WAIT, 0, NULL [pid 21792] futex(0x7ff431402638, FUTEX_WAIT, 0, NULL [pid 21791] futex(0x7ff431402af8, FUTEX_WAIT, 0, NULL [pid 21790] futex(0x7ff431402c98, FUTEX_WAIT, 0, NULL [pid 21786] futex(0x7ff431402478, FUTEX_WAIT, 0, NULL [pid 21785] futex(0x7ff431402758, FUTEX_WAIT, 0, NULL [pid 21784] futex(0x7ff431403098, FUTEX_WAIT, 0, NULL [pid 21768] futex(0x7ff431402718, FUTEX_WAIT, 0, NULL [pid 21662] futex(0x7ff431402338, FUTEX_WAIT, 0, NULL [pid 21648] futex(0x7ff431402c38, FUTEX_WAIT, 0, NULL [pid 21630] futex(0x7ff4314024d8, FUTEX_WAIT, 0, NULL [pid 21591] futex(0x7ff431402278, FUTEX_WAIT, 0, NULL [pid 21590] futex(0x7ff431402bf8, FUTEX_WAIT, 0, NULL Also: [pryzbyj@TS-DB ~]$ date;ps -O lstart,wchan=99 -u postgres --sort start_time |grep startup$ Wed Nov 22 00:47:41 EST 2017 1990 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 postgres: xx xx xx.247(53934) startup 2007 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 postgres: xx xx xx.247(53945) startup 2014 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 postgres: xx xx xx.247(53946) startup 2042 Wed Nov 22 0
Re: backends stuck in "startup"
On Tue, Nov 21, 2017 at 03:45:58PM -0800, Andres Freund wrote: > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > > Justin Pryzby writes: > > > As $subject: backends are stuck in startup for minutes at a time. I > > > didn't > > > strace this time, but I believe last time I saw one was waiting in a > > > futex. > > A futex? Hm, that was stock postgres? Yes, from PGDG: [pryzbyj@TS-DB ~]$ rpm -q postgresql10 postgresql10-10.1-1PGDG.rhel6.x86_64 > How many different databases and roles are you having on that system? postgres=# SELECT COUNT(1) FROM pg_roles ; count | 25 postgres=# SELECT COUNT(1) FROM pg_database; count | 15 > I guess iii is a sanitized name? Yes > Right. I took this to be referring to something before the current > migration, but I might have overinterpreted things. We never had any issue during the ~2 years running PG96 on this VM, until upgrading Monday to PG10.1, and we've now hit it 5+ times. BTW this is a VM run on a hypervisor managed by our customer: DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012 Linux TS-DB 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Justin
Re: backends stuck in "startup"
Rakesh Kumar writes: > why is that I did not receive the first 4 emails on this topic? Perhaps you need to adjust your mail filters. > I see that only the old email address "pgsql-gene...@postgresql.org" is > mentioned. Could that be the reason ? > ps: I am adding the new lists address. Please do not do that; it just results in the mail getting delivered to the list twice, like this did. (I believe the list admins are working on putting in de-duplication logic, because you won't be the last one to make this mistake. But it is a mistake, and right now we don't have a defense against it.) regards, tom lane
Re: backends stuck in "startup"
why is that I did not receive the first 4 emails on this topic? I see that only the old email address "pgsql-gene...@postgresql.org" is mentioned. Could that be the reason ? ps: I am adding the new lists address. On 2017-11-21 19:02:01 -0500, Tom Lane wrote: > and...@anarazel.de (Andres Freund) writes: > > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: > >> (If Justin saw that while still on 9.6, then it'd be worth looking > >> closer.) > > > Right. I took this to be referring to something before the current > > migration, but I might have overinterpreted things. There've been > > various forks/ports of pg around that had hand-coded replacements with > > futex usage, and there were definitely buggy versions going around a few > > years back. > > Poking around in the archives reminded me of this thread: > https://www.postgresql.org/message-id/flat/14947.1475690...@sss.pgh.pa.us > which describes symptoms uncomfortably close to what Justin is showing. > > I remember speculating that the SysV-sema implementation, because it'd > always enter the kernel, would provide some memory barrier behavior > that POSIX-sema code based on futexes might miss when taking the no-wait > path. I think I was speculating that, but with the benefit of just having had my fourth espresso: I've a hard time believing that - the fast path in a futex pretty much has to be either a test-and-set or a compare-and-exchange type operation. See e.g. the demo of futex usage in the futex(2) manpage: while (1) { /* Is the futex available? */ if (__sync_bool_compare_and_swap(futexp, 1, 0)) break; /* Yes */ /* Futex is not available; wait */ s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0); if (s == -1 && errno != EAGAIN) errExit("futex-FUTEX_WAIT"); } I can't see how you could make use of futexes without some kind of barrier semantics, at least on x86. Greetings, Andres Freund
Re: backends stuck in "startup"
I wrote: > ... Maybe we need > to take a closer look at where LWLocks devolve to blocking on the process > semaphore and see if there's any implicit assumptions about barriers there. Like, say, here: for (;;) { PGSemaphoreLock(proc->sem); if (!proc->lwWaiting) break; extraWaits++; } which is exactly where Justin's backend is stuck. Without a read barrier after the PGSemaphoreLock, it seems possible that we might not see our lwWaiting as cleared yet, causing the loop to go around an extra time and wait for a sema wakeup that will never come. Then, we are holding the lock but are blocked anyway, and everyone else who wants that same lock will queue up behind us. regards, tom lane
Re: backends stuck in "startup"
On 2017-11-21 19:02:01 -0500, Tom Lane wrote: > and...@anarazel.de (Andres Freund) writes: > > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: > >> (If Justin saw that while still on 9.6, then it'd be worth looking > >> closer.) > > > Right. I took this to be referring to something before the current > > migration, but I might have overinterpreted things. There've been > > various forks/ports of pg around that had hand-coded replacements with > > futex usage, and there were definitely buggy versions going around a few > > years back. > > Poking around in the archives reminded me of this thread: > https://www.postgresql.org/message-id/flat/14947.1475690...@sss.pgh.pa.us > which describes symptoms uncomfortably close to what Justin is showing. > > I remember speculating that the SysV-sema implementation, because it'd > always enter the kernel, would provide some memory barrier behavior > that POSIX-sema code based on futexes might miss when taking the no-wait > path. I think I was speculating that, but with the benefit of just having had my fourth espresso: I've a hard time believing that - the fast path in a futex pretty much has to be either a test-and-set or a compare-and-exchange type operation. See e.g. the demo of futex usage in the futex(2) manpage: while (1) { /* Is the futex available? */ if (__sync_bool_compare_and_swap(futexp, 1, 0)) break; /* Yes */ /* Futex is not available; wait */ s = futex(futexp, FUTEX_WAIT, 0, NULL, NULL, 0); if (s == -1 && errno != EAGAIN) errExit("futex-FUTEX_WAIT"); } I can't see how you could make use of futexes without some kind of barrier semantics, at least on x86. Greetings, Andres Freund
Re: backends stuck in "startup"
and...@anarazel.de (Andres Freund) writes: > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: >> (If Justin saw that while still on 9.6, then it'd be worth looking >> closer.) > Right. I took this to be referring to something before the current > migration, but I might have overinterpreted things. There've been > various forks/ports of pg around that had hand-coded replacements with > futex usage, and there were definitely buggy versions going around a few > years back. Poking around in the archives reminded me of this thread: https://www.postgresql.org/message-id/flat/14947.1475690...@sss.pgh.pa.us which describes symptoms uncomfortably close to what Justin is showing. I remember speculating that the SysV-sema implementation, because it'd always enter the kernel, would provide some memory barrier behavior that POSIX-sema code based on futexes might miss when taking the no-wait path. I'd figured that any real problems of that sort would show up pretty quickly, but that could've been over optimistic. Maybe we need to take a closer look at where LWLocks devolve to blocking on the process semaphore and see if there's any implicit assumptions about barriers there. regards, tom lane
Re: backends stuck in "startup"
On 2017-11-21 18:50:05 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > >> Justin Pryzby writes: > >>> As $subject: backends are stuck in startup for minutes at a time. I > >>> didn't > >>> strace this time, but I believe last time I saw one was waiting in a > >>> futex. > > > A futex? Hm, that was stock postgres? > > Sure, since we switched to unnamed-POSIX semaphores in v10. On Linux > those are futexes. > (If Justin saw that while still on 9.6, then it'd be worth looking > closer.) Right. I took this to be referring to something before the current migration, but I might have overinterpreted things. There've been various forks/ports of pg around that had hand-coded replacements with futex usage, and there were definitely buggy versions going around a few years back. Greetings, Andres Freund
Re: backends stuck in "startup"
Andres Freund writes: > On 2017-11-21 18:21:16 -0500, Tom Lane wrote: >> Justin Pryzby writes: >>> As $subject: backends are stuck in startup for minutes at a time. I didn't >>> strace this time, but I believe last time I saw one was waiting in a futex. > A futex? Hm, that was stock postgres? Sure, since we switched to unnamed-POSIX semaphores in v10. On Linux those are futexes. (If Justin saw that while still on 9.6, then it'd be worth looking closer.) regards, tom lane
Re: backends stuck in "startup"
On 2017-11-21 18:21:16 -0500, Tom Lane wrote: > Justin Pryzby writes: > > As $subject: backends are stuck in startup for minutes at a time. I didn't > > strace this time, but I believe last time I saw one was waiting in a futex. > > Hm... A futex? Hm, that was stock postgres? > > I saved ~40 cores from backends from the most recent incident, which are all > > essentially identical: > > This one seems to be waiting for a shared LWLock (specifically, one of > the locks for the shared-buffer lookup hashtable), which could only block > if someone else is holding that lock exclusively, which implies that > there's at least one backtrace that's different from this. Hm. Wonder if that implies that there's a lot of cache pressure, or whether it's the compounded amount of How many different databases and roles are you having on that system? I guess iii is a sanitized name? Greetings, Andres Freund
Re: backends stuck in "startup"
Hi, On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote: > I'm sorry to report this previously reported problem is happening again, > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. > > As $subject: backends are stuck in startup for minutes at a time. I didn't > strace this time, but I believe last time I saw one was waiting in a futex. Could you try stracing next time? Don't think it's conclusive that they're all continually waiting rather than contending on a few locks. As soon as you attach to one backend a contended lock situation can quickly turn into standstill if you end up interrupting a process that holds a lock. > - we DO use temp tables; it looks like we created up to 90 temp tables each > hour; Doesn't quite seem enough to cause problems. > - I'm planning do disable sync seqscan for the night (although as I > understand >that may not matter (?)) I doubt it matters here. Greetings, Andres Freund
Re: backends stuck in "startup"
Justin Pryzby writes: > As $subject: backends are stuck in startup for minutes at a time. I didn't > strace this time, but I believe last time I saw one was waiting in a futex. Hm... > I saved ~40 cores from backends from the most recent incident, which are all > essentially identical: This one seems to be waiting for a shared LWLock (specifically, one of the locks for the shared-buffer lookup hashtable), which could only block if someone else is holding that lock exclusively, which implies that there's at least one backtrace that's different from this. Unless you can find what was holding that lock, there's not much we can learn from this report ... regards, tom lane
backends stuck in "startup"
I'm sorry to report this previously reported problem is happening again, starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5. As $subject: backends are stuck in startup for minutes at a time. I didn't strace this time, but I believe last time I saw one was waiting in a futex. postgres 25271 17665 0 16:54 ?00:00:00 postgres: a.b.c.246(59980) startup postgres 27123 17665 0 16:55 ?00:00:00 postgres: a.b.c.247(59919) startup postgres 27142 17665 0 16:55 ?00:00:00 postgres: main main a.b.c.100(42892) startup postgres 27224 17665 0 16:56 ?00:00:00 postgres: main main a.b.c.100(42898) startup postgres 27270 17665 0 16:56 ?00:00:00 postgres: ww ww a.b.c.252(39760) startup postgres 27271 17665 0 16:56 ?00:00:00 postgres: iii iii a.b.c.248(37224) startup postgres 27873 17665 0 16:58 ?00:00:00 postgres: a.b.c.253(59667) startup postgres 30038 17665 0 17:00 ?00:00:00 postgres: ww ww a.b.c.252(39780) startup postgres 30039 17665 0 17:00 ?00:00:00 postgres: a.b.c.246(60014) startup postgres 30040 17665 0 17:00 ?00:00:00 postgres: a.b.c.247(59951) startup postgres 30041 17665 0 17:00 ?00:00:00 postgres: a.b.c.247(59952) startup postgres 30042 17665 0 17:00 ?00:00:00 postgres: ww ww a.b.c.252(39788) startup postgres 30043 17665 0 17:00 ?00:00:00 postgres: a a a.b.c.245(44819) startup postgres 30044 17665 0 17:00 ?00:00:00 postgres: nnn nnn a.b.c.250(41317) startup postgres 30045 17665 0 17:00 ?00:00:00 postgres: mm mm a.b.c.249(47876) startup postgres 30046 17665 0 17:00 ?00:00:00 postgres: nnn nnn a.b.c.250(41320) startup postgres 30048 17665 0 17:00 ?00:00:00 postgres: a a a.b.c.245(44834) startup postgres 30051 17665 0 17:00 ?00:00:00 postgres: iii iii a.b.c.248(37244) startup postgres 30052 17665 0 17:00 ?00:00:00 postgres: iii iii a.b.c.248(37247) startup postgres 30054 17665 0 17:00 ?00:00:00 postgres: nnn nnn a.b.c.250(41327) startup postgres 30079 17665 0 17:00 ?00:00:00 postgres: a a a.b.c.245(44843) startup postgres 30081 17665 0 17:00 ?00:00:00 postgres: ww ww a.b.c.252(39790) startup postgres 30083 17665 0 17:00 ?00:00:00 postgres: a.b.c.251(50397) startup postgres 30084 17665 0 17:00 ?00:00:00 postgres: main main a.b.c.100(43074) startup postgres 30604 17665 0 17:02 ?00:00:00 postgres: a.b.c.246(60018) startup postgres 30614 17665 0 17:02 ?00:00:00 postgres: nnn nnn a.b.c.250(41331) startup postgres 30624 17665 0 17:02 ?00:00:00 postgres: a a a.b.c.245(44847) startup postgres 30841 17665 0 17:03 ?00:00:00 postgres: a.b.c.247(59961) startup postgres 31023 17665 0 17:03 ?00:00:00 postgres: postgres main [local] startup postgres 31024 17665 0 17:03 ?00:00:00 postgres: ww ww a.b.c.252(39796) startup I found these older, similar reports: https://www.postgresql.org/message-id/flat/4F9AC452.3050102%40pgexperts.com#4f9ac452.3050...@pgexperts.com https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us I saved ~40 cores from backends from the most recent incident, which are all essentially identical: (gdb) bt #0 0x003f6ee0d930 in ?? () #1 0x7f19f3401258 in ?? () #2 0x006a4440 in PGSemaphoreLock (sema=0x7f19f3401258) at pg_sema.c:310 #3 0x00714495 in LWLockAcquire (lock=0x7f19f3404f80, mode=LW_SHARED) at lwlock.c:1233 #4 0x006f75e5 in BufferAlloc (smgr=0x1f3bc48, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff6ef1c5af "") at bufmgr.c:1012 #5 ReadBuffer_common (smgr=0x1f3bc48, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff6ef1c5af "") at bufmgr.c:745 #6 0x006f800e in ReadBufferExtended (reln=0x1ee6a08, forkNum=MAIN_FORKNUM, blockNum=1, mode=, strategy=) at bufmgr.c:664 #7 0x004c090a in _bt_relandgetbuf (rel=0x1ee6a08, obuf=145, blkno=, access=1) at nbtpage.c:708 #8 0x004c1284 in _bt_getroot (rel=0x1ee6a08, access=) at nbtpage.c:298 #9 0x004c46eb in _bt_search (rel=0x1ee6a08, keysz=1, scankey=0x7fff6ef1d040, nextkey=0 '\000', bufP=0x7fff6ef1da4c, access=1, snapshot=0x1f1a648) at nbtsearch.c:103 #10 0x004c5d1d in _bt_first (scan=0x1f1a4a8, dir=ForwardScanDirection) at nbtsearch.c:1028 #11 0x004c39dd in btgettuple (scan=0x1f1a4a8, dir=ForwardScanDirection) at nbtree.c:372 #12 0x004bcf1a in index_getnext_tid (scan=0x1f1a4a8, direction=) at indexam.c:541 #13 0x00