Re: [HACKERS] stuck spin lock with many concurrent users
Hm. It doesn't seem that DeadLockCheck is taking very much of the time. Isn't the real time big ? Yes, it sure is, but remember that the guy getting useful work done (DeadLockCheck) is having to share the CPU with 999 other processes that are waking up on every clock tick for just long enough to fail to get the spinlock. I think it's those useless process wakeups that are causing the problem. If you estimate that a process dispatch cycle is ~ 10 microseconds, then waking 999 useless processes every 10 msec is just about enough to consume 100% of the CPU doing nothing useful... so what should be a few-millisecond check takes a long time, which makes things worse because the 999 wannabees are spinning for that much more time. If so, what about increase the dead lock timer proportional to the length of the waiting holder queue? Here are the patches against current to increase the dealock timer by queue_length secconds. *** proc.c.orig Thu Jul 5 16:12:32 2001 --- proc.c Thu Jul 5 16:20:22 2001 *** *** 506,511 --- 506,512 int myHeldLocks = MyProc-heldLocks; PROC *proc; int i; + int msec; #ifndef __BEOS__ struct itimerval timeval, *** *** 625,638 * Need to zero out struct to set the interval and the microseconds * fields to 0. */ #ifndef __BEOS__ MemSet(timeval, 0, sizeof(struct itimerval)); ! timeval.it_value.tv_sec = DeadlockTimeout / 1000; ! timeval.it_value.tv_usec = (DeadlockTimeout % 1000) * 1000; if (setitimer(ITIMER_REAL, timeval, dummy)) elog(FATAL, ProcSleep: Unable to set timer for process wakeup); #else ! time_interval = DeadlockTimeout * 100; /* usecs */ if (set_alarm(time_interval, B_ONE_SHOT_RELATIVE_ALARM) 0) elog(FATAL, ProcSleep: Unable to set timer for process wakeup); #endif --- 626,642 * Need to zero out struct to set the interval and the microseconds * fields to 0. */ + + msec = DeadlockTimeout + waitQueue-size * 1000; + #ifndef __BEOS__ MemSet(timeval, 0, sizeof(struct itimerval)); ! timeval.it_value.tv_sec = msec / 1000; ! timeval.it_value.tv_usec = (msec % 1000) * 1000; if (setitimer(ITIMER_REAL, timeval, dummy)) elog(FATAL, ProcSleep: Unable to set timer for process wakeup); #else ! time_interval = msec * 100; /* usecs */ if (set_alarm(time_interval, B_ONE_SHOT_RELATIVE_ALARM) 0) elog(FATAL, ProcSleep: Unable to set timer for process wakeup); #endif ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: If so, what about increase the dead lock timer proportional to the length of the waiting holder queue? I don't think that's a good idea; it's not solving the problem, only reducing performance, and in a fairly arbitrary way at that. (The length of the particular wait queue you happen to be on is no measure of the total number of processes waiting for locks.) The real problem is in the spinlock implementation --- deadlock checking is only one place where lots of processes might gang up on the same spinlock. The bufmgr lock is another one. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I added some codes into HandleDeadLock to measure how long LockLockTable and DeadLOckCheck calls take. Followings are the result in running pgbench -c 1000 (it failed with stuck spin lock error). real time shows how long they actually run (using gettimeofday). user time and system time are measured by calling getrusage. The time unit is milli second. LockLockTable: real time min | max |avg -++--- 0 | 867873 | 152874.9015151515 LockLockTable: user time min | max | avg -+-+-- 0 | 30 | 1.2121212121 LockLockTable: system time min | max | avg -+--+ 0 | 2140 | 366.5909090909 DeadLockCheck: real time min | max | avg -+---+- 0 | 87671 | 3463.6996197719 DeadLockCheck: user time min | max | avg -+-+--- 0 | 330 | 14.2205323194 DeadLockCheck: system time min | max | avg -+-+-- 0 | 100 | 2.5095057034 Hm. It doesn't seem that DeadLockCheck is taking very much of the time. I have to suppose that the problem is (once again) our inefficient spinlock code. If you think about it, on a typical platform where processes waiting for a time delay are released at a clock tick, what's going to be happening is that a whole lot of spinblocked processes will all be awoken in the same clock tick interrupt. The first one of these that gets to run will acquire the spinlock, if it's free, and the rest will go back to sleep and try again at the next tick. This could be highly unfair depending on just how the kernel's scheduler works --- for example, one could easily believe that the waiters might be awoken in process-number order, in which case backends with high process numbers might never get to acquire the spinlock, or at least would have such low probability of winning that they are prone to stuck spinlock timeout. We really need to look at replacing the spinlock mechanism with something more efficient. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] stuck spin lock with many concurrent users
Hiroshi Inoue [EMAIL PROTECTED] writes: DeadLockCheck: real time min | max | avg -+---+- 0 | 87671 | 3463.6996197719 DeadLockCheck: user time min | max | avg -+-+--- 0 | 330 | 14.2205323194 DeadLockCheck: system time min | max | avg -+-+-- 0 | 100 | 2.5095057034 Hm. It doesn't seem that DeadLockCheck is taking very much of the time. Isn't the real time big ? Yes, it sure is, but remember that the guy getting useful work done (DeadLockCheck) is having to share the CPU with 999 other processes that are waking up on every clock tick for just long enough to fail to get the spinlock. I think it's those useless process wakeups that are causing the problem. If you estimate that a process dispatch cycle is ~ 10 microseconds, then waking 999 useless processes every 10 msec is just about enough to consume 100% of the CPU doing nothing useful... so what should be a few-millisecond check takes a long time, which makes things worse because the 999 wannabees are spinning for that much more time. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Bruce Momjian [EMAIL PROTECTED] writes: If you estimate that a process dispatch cycle is ~ 10 microseconds, then waking 999 useless processes every 10 msec is just about enough to consume 100% of the CPU doing nothing useful... Don't we back off the sleeps or was that code removed? Not enough to affect this calculation. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: In my understanding the deadlock check is performed every time the backend aquires lock. Once the it aquires, it kill the timer. However, under heavy transactions such as pgbench generates, chances are that the checking fires, and it tries to aquire a spin lock. That seems the situation. It could be that with ~1000 backends all waiting for the same lock, the deadlock-checking code just plain takes too long to run. It might have an O(N^2) or worse behavior in the length of the queue; I don't think the code was ever analyzed for such problems. Do you want to try adding some instrumentation to HandleDeadlock to see how long it runs on each call? I added some codes into HandleDeadLock to measure how long LockLockTable and DeadLOckCheck calls take. Followings are the result in running pgbench -c 1000 (it failed with stuck spin lock error). real time shows how long they actually run (using gettimeofday). user time and system time are measured by calling getrusage. The time unit is milli second. LockLockTable: real time min | max |avg -++--- 0 | 867873 | 152874.9015151515 LockLockTable: user time min | max | avg -+-+-- 0 | 30 | 1.2121212121 LockLockTable: system time min | max | avg -+--+ 0 | 2140 | 366.5909090909 DeadLockCheck: real time min | max | avg -+---+- 0 | 87671 | 3463.6996197719 DeadLockCheck: user time min | max | avg -+-+--- 0 | 330 | 14.2205323194 DeadLockCheck: system time min | max | avg -+-+-- 0 | 100 | 2.5095057034 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes How can I check it? The 'stuck' message should at least give you a code location... FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Hmm, that's SpinAcquire, so it's one of the predefined spinlocks (and not, say, a buffer spinlock). You could try adding some debug logging here, although the output would be voluminous. But what would really be useful is a stack trace for the stuck process. Consider changing the s_lock code to abort() when it gets a stuck spinlock --- then you could gdb the coredump. Nice idea. I will try that. It appeared that the deadlock checking timer seems to be the source of the problem. With the default settings, it checks deadlocks every 1 second PER backend. So if there are 1000 backends, every 1 msec there's a signal and a shared memory locking in average. That would be too much. If increase the dealock_timeout to , say 10, the problem seems gone. Also the performance increased SIGNIFICANTLY. Before that I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000. Here is the backtrace: #0 0x2ab56d21 in __kill () from /lib/libc.so.6 #1 0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27 #2 0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88 #3 0x80ece1a in s_lock_stuck (lock=0x2ac2d016 \001, file=0x816e7bc spin.c, line=158) at s_lock.c:70 #4 0x80ecf3e in s_lock_sleep (spins=20001, timeout=1, microsec=5000, lock=0x2ac2d016 \001, file=0x816e7bc spin.c, line=158) at s_lock.c:109 #5 0x80ecfa3 in s_lock (lock=0x2ac2d016 \001, file=0x816e7bc spin.c, line=158) at s_lock.c:136 #6 0x80efb4d in SpinAcquire (lockid=6) at spin.c:158 #7 0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819 #8 signal handler called #9 0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1) at ../sysdeps/unix/sysv/linux/semop.c:34 #10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001') at ipc.c:426 #11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6, lock=0x2ce0ab18, holder=0x2ce339b0) at proc.c:666 #12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18, holder=0x2ce339b0) at lock.c:955 #13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139, lockmode=6) at lock.c:739 #14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6) #15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132) at hio.c:97 #16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7114, newtup=0x82388c8, ctid=0x70b0) at heapam.c:1737 #17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7114, estate=0x8238a58) at execMain.c:1450 #18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00, operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection, destfunc=0x823b680) at execMain.c:1125 #19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58, feature=3, count=0) at execMain.c:233 #20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00, dest=Remote) at pquery.c:295 #21 0x80f599b in pg_exec_query_string ( query_string=0x822b8c0 update accounts set abalance = abalance + 277 where aid = 41148\n, dest=Remote, parse_context=0x81fc850) at postgres.c:810 #22 0x80f68c6 in PostgresMain (argc=4, argv=0x7380, real_argc=3, real_argv=0x7c94, username=0x81cd981 t-ishii) at postgres.c:1908 #23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120 #24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903 #25 0x80e0e26 in ServerLoop () at postmaster.c:995 #26 0x80e0853 in PostmasterMain (argc=3, argv=0x7c94) at postmaster.c:685 #27 0x80c4865 in main (argc=3, argv=0x7c94) at main.c:175 #28 0x2ab509cb in __libc_start_main (main=0x80c4750 main, argc=3, argv=0x7c94, init=0x80656c4 _init, fini=0x81395ac _fini, rtld_fini=0x2aab5ea0 _dl_fini, stack_end=0x7c8c) at ../sysdeps/generic/libc-start.c:92 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii wrote: Tatsuo Ishii [EMAIL PROTECTED] writes How can I check it? The 'stuck' message should at least give you a code location... FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Hmm, that's SpinAcquire, so it's one of the predefined spinlocks (and not, say, a buffer spinlock). You could try adding some debug logging here, although the output would be voluminous. But what would really be useful is a stack trace for the stuck process. Consider changing the s_lock code to abort() when it gets a stuck spinlock --- then you could gdb the coredump. Nice idea. I will try that. It appeared that the deadlock checking timer seems to be the source of the problem. With the default settings, it checks deadlocks every 1 second PER backend. IIRC deadlock check was called only once per backend. In my understanding the deadlock check is performed every time the backend aquires lock. Once the it aquires, it kill the timer. However, under heavy transactions such as pgbench generates, chances are that the checking fires, and it tries to aquire a spin lock. That seems the situation. -- Tatsuo Ishii ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: It appeared that the deadlock checking timer seems to be the source of the problem. With the default settings, it checks deadlocks every 1 second PER backend. I don't believe it. setitimer with it_interval = 0 should produce one interrupt, no more. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: In my understanding the deadlock check is performed every time the backend aquires lock. Once the it aquires, it kill the timer. However, under heavy transactions such as pgbench generates, chances are that the checking fires, and it tries to aquire a spin lock. That seems the situation. It could be that with ~1000 backends all waiting for the same lock, the deadlock-checking code just plain takes too long to run. It might have an O(N^2) or worse behavior in the length of the queue; I don't think the code was ever analyzed for such problems. Do you want to try adding some instrumentation to HandleDeadlock to see how long it runs on each call? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] stuck spin lock with many concurrent users
FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Hmm, that's SpinAcquire, so it's one of the predefined spinlocks (and not, say, a buffer spinlock). You could try adding some debug logging here, although the output would be voluminous. But what would really be useful is a stack trace for the stuck process. Consider changing the s_lock code to abort() when it gets a stuck spinlock --- then you could gdb the coredump. Nice idea. I will try that. I got an interesting result. If I compile backend with -g (and without -O2), I get no stuck spin lock errors. However, if s_lock.c is compiled with -O2 enabled, I got the error again. It seems only s_lock.c is related to this phenomenon. -- Tatsuo Ishii ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I got an interesting result. If I compile backend with -g (and without -O2), I get no stuck spin lock errors. However, if s_lock.c is compiled with -O2 enabled, I got the error again. It seems only s_lock.c is related to this phenomenon. That's very interesting. Could optimization be breaking the TAS sequence on your platform? What is your platform, anyway? Might need to burrow into the assembly code to see just what's happening. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I got an interesting result. If I compile backend with -g (and without -O2), I get no stuck spin lock errors. However, if s_lock.c is compiled with -O2 enabled, I got the error again. It seems only s_lock.c is related to this phenomenon. That's very interesting. Could optimization be breaking the TAS sequence on your platform? What is your platform, anyway? Might need to burrow into the assembly code to see just what's happening. As I said, it's a x86 Linux (more precisely, kernel 2.2.18 with 2 processors, egcs 2.91). I suspect that the inlined TAS code might be incompatible with the caller, s_lock. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I have seen problems with extremely many concurrent users. I run pgbench: pgbench -c 1000 -t 1 test And I get stuck spin lock errors. This is 100% reproducable (i.e. I have nerver succeeded in pgbench -c 1000). Is it actually stuck, or just timing out due to huge contention? You could try increasing the timeout intervals in s_lock.c to make sure. I believe it's an actual stuck. From s_lock.c: #define DEFAULT_TIMEOUT (100*100) /* default timeout: 100 sec */ So even if there are 1000 contentions, 100 sec should be enough (100 msec for each backend). If it is stuck, on which lock(s)? How can I check it? In that situation, it's very hard to attacth a debugger to the backend process. 1000 backends consum all CPU time. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: If it is stuck, on which lock(s)? How can I check it? The 'stuck' message should at least give you a code location... regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: If it is stuck, on which lock(s)? How can I check it? The 'stuck' message should at least give you a code location... Here is the actual message: FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Last several queries before stuck spinlock are: DEBUG: query: update branches set bbalance = bbalance + 436 where bid = 1 DEBUG: query: update tellers set tbalance = tbalance + 230 where tid = 17 DEBUG: query: update tellers set tbalance = tbalance + 740 where tid = 7 DEBUG: query: update tellers set tbalance = tbalance + 243 where tid = 13 DEBUG: query: select abalance from accounts where aid = 177962 DEBUG: query: update tellers set tbalance = tbalance + 595 where tid = 18 DEBUG: query: update branches set bbalance = bbalance + 595 where bid = 1 DEBUG: query: update tellers set tbalance = tbalance + 252 where tid = 15 I'm trying now is increasing the timeout to 10 times longer. Will report in next email... -- Tatsuo Ishii ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: How can I check it? The 'stuck' message should at least give you a code location... FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Hmm, that's SpinAcquire, so it's one of the predefined spinlocks (and not, say, a buffer spinlock). You could try adding some debug logging here, although the output would be voluminous. But what would really be useful is a stack trace for the stuck process. Consider changing the s_lock code to abort() when it gets a stuck spinlock --- then you could gdb the coredump. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: How can I check it? The 'stuck' message should at least give you a code location... FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting. Hmm, that's SpinAcquire, so it's one of the predefined spinlocks (and not, say, a buffer spinlock). You could try adding some debug logging here, although the output would be voluminous. But what would really be useful is a stack trace for the stuck process. Consider changing the s_lock code to abort() when it gets a stuck spinlock --- then you could gdb the coredump. Nice idea. I will try that. -- Tatsuo Ishii ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] stuck spin lock with many concurrent users
Tatsuo Ishii [EMAIL PROTECTED] writes: I have seen problems with extremely many concurrent users. I run pgbench: pgbench -c 1000 -t 1 test And I get stuck spin lock errors. This is 100% reproducable (i.e. I have nerver succeeded in pgbench -c 1000). Is it actually stuck, or just timing out due to huge contention? You could try increasing the timeout intervals in s_lock.c to make sure. If it is stuck, on which lock(s)? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl