Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-05 Thread Tatsuo Ishii

  Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
 
  Isn't the real time big ?
 
 Yes, it sure is, but remember that the guy getting useful work done
 (DeadLockCheck) is having to share the CPU with 999 other processes
 that are waking up on every clock tick for just long enough to fail
 to get the spinlock.  I think it's those useless process wakeups that
 are causing the problem.
 
 If you estimate that a process dispatch cycle is ~ 10 microseconds,
 then waking 999 useless processes every 10 msec is just about enough
 to consume 100% of the CPU doing nothing useful... so what should be
 a few-millisecond check takes a long time, which makes things worse
 because the 999 wannabees are spinning for that much more time.

If so, what about increase the dead lock timer proportional to the
length of the waiting holder queue?

Here are the patches against current to increase the dealock timer by
queue_length secconds.

*** proc.c.orig Thu Jul  5 16:12:32 2001
--- proc.c  Thu Jul  5 16:20:22 2001
***
*** 506,511 
--- 506,512 
int myHeldLocks = MyProc-heldLocks;
PROC   *proc;
int i;
+   int msec;
  
  #ifndef __BEOS__
struct itimerval timeval,
***
*** 625,638 
 * Need to zero out struct to set the interval and the microseconds
 * fields to 0.
 */
  #ifndef __BEOS__
MemSet(timeval, 0, sizeof(struct itimerval));
!   timeval.it_value.tv_sec = DeadlockTimeout / 1000;
!   timeval.it_value.tv_usec = (DeadlockTimeout % 1000) * 1000;
if (setitimer(ITIMER_REAL, timeval, dummy))
elog(FATAL, ProcSleep: Unable to set timer for process wakeup);
  #else
!   time_interval = DeadlockTimeout * 100;  /* usecs */
if (set_alarm(time_interval, B_ONE_SHOT_RELATIVE_ALARM)  0)
elog(FATAL, ProcSleep: Unable to set timer for process wakeup);
  #endif
--- 626,642 
 * Need to zero out struct to set the interval and the microseconds
 * fields to 0.
 */
+ 
+   msec = DeadlockTimeout + waitQueue-size * 1000;
+ 
  #ifndef __BEOS__
MemSet(timeval, 0, sizeof(struct itimerval));
!   timeval.it_value.tv_sec = msec / 1000;
!   timeval.it_value.tv_usec = (msec % 1000) * 1000;
if (setitimer(ITIMER_REAL, timeval, dummy))
elog(FATAL, ProcSleep: Unable to set timer for process wakeup);
  #else
!   time_interval = msec * 100; /* usecs */
if (set_alarm(time_interval, B_ONE_SHOT_RELATIVE_ALARM)  0)
elog(FATAL, ProcSleep: Unable to set timer for process wakeup);
  #endif

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-05 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 If so, what about increase the dead lock timer proportional to the
 length of the waiting holder queue?

I don't think that's a good idea; it's not solving the problem, only
reducing performance, and in a fairly arbitrary way at that.  (The
length of the particular wait queue you happen to be on is no measure
of the total number of processes waiting for locks.)

The real problem is in the spinlock implementation --- deadlock checking
is only one place where lots of processes might gang up on the same
spinlock.  The bufmgr lock is another one.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 I added some codes into HandleDeadLock to measure how long
 LockLockTable and DeadLOckCheck calls take. Followings are the result
 in running pgbench -c 1000 (it failed with stuck spin lock
 error). real time shows how long they actually run (using
 gettimeofday). user time and system time are measured by calling
 getrusage. The time unit is milli second.

  LockLockTable: real time

  min |  max   |avg
 -++---
0 | 867873 | 152874.9015151515

  LockLockTable: user time

  min | max | avg  
 -+-+--
0 |  30 | 1.2121212121

  LockLockTable: system time

  min | max  |  avg   
 -+--+
0 | 2140 | 366.5909090909


  DeadLockCheck: real time

  min |  max  |   avg   
 -+---+-
0 | 87671 | 3463.6996197719

  DeadLockCheck: user time

  min | max |  avg  
 -+-+---
0 | 330 | 14.2205323194

  DeadLockCheck: system time

  min | max | avg  
 -+-+--
0 | 100 | 2.5095057034

Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
I have to suppose that the problem is (once again) our inefficient
spinlock code.

If you think about it, on a typical platform where processes waiting for
a time delay are released at a clock tick, what's going to be happening
is that a whole lot of spinblocked processes will all be awoken in the
same clock tick interrupt.  The first one of these that gets to run will
acquire the spinlock, if it's free, and the rest will go back to sleep
and try again at the next tick.  This could be highly unfair depending
on just how the kernel's scheduler works --- for example, one could
easily believe that the waiters might be awoken in process-number order,
in which case backends with high process numbers might never get to
acquire the spinlock, or at least would have such low probability of
winning that they are prone to stuck spinlock timeout.

We really need to look at replacing the spinlock mechanism with
something more efficient.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:

 DeadLockCheck: real time
 
 min |  max  |   avg
 -+---+-
 0 | 87671 | 3463.6996197719
 
 DeadLockCheck: user time
 
 min | max |  avg
 -+-+---
 0 | 330 | 14.2205323194
 
 DeadLockCheck: system time
 
 min | max | avg
 -+-+--
 0 | 100 | 2.5095057034
 
 Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.

 Isn't the real time big ?

Yes, it sure is, but remember that the guy getting useful work done
(DeadLockCheck) is having to share the CPU with 999 other processes
that are waking up on every clock tick for just long enough to fail
to get the spinlock.  I think it's those useless process wakeups that
are causing the problem.

If you estimate that a process dispatch cycle is ~ 10 microseconds,
then waking 999 useless processes every 10 msec is just about enough
to consume 100% of the CPU doing nothing useful... so what should be
a few-millisecond check takes a long time, which makes things worse
because the 999 wannabees are spinning for that much more time.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-07-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 If you estimate that a process dispatch cycle is ~ 10 microseconds,
 then waking 999 useless processes every 10 msec is just about enough
 to consume 100% of the CPU doing nothing useful...

 Don't we back off the sleeps or was that code removed?

Not enough to affect this calculation.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-27 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  In my understanding the deadlock check is performed every time the
  backend aquires lock. Once the it aquires, it kill the timer. However,
  under heavy transactions such as pgbench generates, chances are that
  the checking fires, and it tries to aquire a spin lock. That seems the
  situation.
 
 It could be that with ~1000 backends all waiting for the same lock, the
 deadlock-checking code just plain takes too long to run.  It might have
 an O(N^2) or worse behavior in the length of the queue; I don't think
 the code was ever analyzed for such problems.
 
 Do you want to try adding some instrumentation to HandleDeadlock to see
 how long it runs on each call?

I added some codes into HandleDeadLock to measure how long
LockLockTable and DeadLOckCheck calls take. Followings are the result
in running pgbench -c 1000 (it failed with stuck spin lock
error). real time shows how long they actually run (using
gettimeofday). user time and system time are measured by calling
getrusage. The time unit is milli second.

 LockLockTable: real time

 min |  max   |avg
-++---
   0 | 867873 | 152874.9015151515

 LockLockTable: user time

 min | max | avg  
-+-+--
   0 |  30 | 1.2121212121

 LockLockTable: system time

 min | max  |  avg   
-+--+
   0 | 2140 | 366.5909090909


 DeadLockCheck: real time

 min |  max  |   avg   
-+---+-
   0 | 87671 | 3463.6996197719

 DeadLockCheck: user time

 min | max |  avg  
-+-+---
   0 | 330 | 14.2205323194

 DeadLockCheck: system time

 min | max | avg  
-+-+--
   0 | 100 | 2.5095057034

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-26 Thread Tatsuo Ishii

  Tatsuo Ishii [EMAIL PROTECTED] writes
   How can I check it?
   
   The 'stuck' message should at least give you a code location...
  
   FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
  
  Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
  (and not, say, a buffer spinlock).  You could try adding some
  debug logging here, although the output would be voluminous.
  But what would really be useful is a stack trace for the stuck
  process.  Consider changing the s_lock code to abort() when it
  gets a stuck spinlock --- then you could gdb the coredump.
 
 Nice idea. I will try that.

It appeared that the deadlock checking timer seems to be the source of
the problem. With the default settings, it checks deadlocks every 1
second PER backend. So if there are 1000 backends, every 1 msec
there's a signal and a shared memory locking in average. That would be
too much. If increase the dealock_timeout to , say 10, the problem
seems gone. Also the performance increased SIGNIFICANTLY. Before that
I got only 1-2 TPS, but now I get ~20 TPS using pgbench -c 1000.

Here is the backtrace:

#0  0x2ab56d21 in __kill () from /lib/libc.so.6
#1  0x2ab56996 in raise (sig=6) at ../sysdeps/posix/raise.c:27
#2  0x2ab580b8 in abort () at ../sysdeps/generic/abort.c:88
#3  0x80ece1a in s_lock_stuck (lock=0x2ac2d016 \001, 
file=0x816e7bc spin.c, line=158) at s_lock.c:70
#4  0x80ecf3e in s_lock_sleep (spins=20001, timeout=1, microsec=5000, 
lock=0x2ac2d016 \001, file=0x816e7bc spin.c, line=158) at s_lock.c:109
#5  0x80ecfa3 in s_lock (lock=0x2ac2d016 \001, file=0x816e7bc spin.c, 
line=158) at s_lock.c:136
#6  0x80efb4d in SpinAcquire (lockid=6) at spin.c:158
#7  0x80f2305 in HandleDeadLock (postgres_signal_arg=14) at proc.c:819
#8  signal handler called
#9  0x2abeb134 in semop (semid=32786, sops=0x7fffeebc, nsops=1)
at ../sysdeps/unix/sysv/linux/semop.c:34
#10 0x80ee460 in IpcSemaphoreLock (semId=32786, sem=13, interruptOK=1 '\001')
at ipc.c:426
#11 0x80f217f in ProcSleep (lockMethodTable=0x81c1708, lockmode=6, 
lock=0x2ce0ab18, holder=0x2ce339b0) at proc.c:666
#12 0x80f14ff in WaitOnLock (lockmethod=1, lockmode=6, lock=0x2ce0ab18, 
holder=0x2ce339b0) at lock.c:955
#13 0x80f1298 in LockAcquire (lockmethod=1, locktag=0x7fffeffc, xid=130139, 
lockmode=6) at lock.c:739
#14 0x80f0a23 in LockPage (relation=0x2dbeb9d0, blkno=0, lockmode=6)
#15 0x8071ceb in RelationGetBufferForTuple (relation=0x2dbeb9d0, len=132)
at hio.c:97
#16 0x8070293 in heap_update (relation=0x2dbeb9d0, otid=0x7114, 
newtup=0x82388c8, ctid=0x70b0) at heapam.c:1737
#17 0x80b6825 in ExecReplace (slot=0x823af60, tupleid=0x7114, 
estate=0x8238a58) at execMain.c:1450
#18 0x80b651e in ExecutePlan (estate=0x8238a58, plan=0x8238d00, 
operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection, 
destfunc=0x823b680) at execMain.c:1125
#19 0x80b5af3 in ExecutorRun (queryDesc=0x8239080, estate=0x8238a58, 
feature=3, count=0) at execMain.c:233
#20 0x80f6d93 in ProcessQuery (parsetree=0x822bc18, plan=0x8238d00, 
dest=Remote) at pquery.c:295
#21 0x80f599b in pg_exec_query_string (
query_string=0x822b8c0 update accounts set abalance = abalance + 277 where aid = 
41148\n, dest=Remote, parse_context=0x81fc850) at postgres.c:810
#22 0x80f68c6 in PostgresMain (argc=4, argv=0x7380, real_argc=3, 
real_argv=0x7c94, username=0x81cd981 t-ishii) at postgres.c:1908
#23 0x80e1ee3 in DoBackend (port=0x81cd718) at postmaster.c:2120
#24 0x80e1acc in BackendStartup (port=0x81cd718) at postmaster.c:1903
#25 0x80e0e26 in ServerLoop () at postmaster.c:995
#26 0x80e0853 in PostmasterMain (argc=3, argv=0x7c94) at postmaster.c:685
#27 0x80c4865 in main (argc=3, argv=0x7c94) at main.c:175
#28 0x2ab509cb in __libc_start_main (main=0x80c4750 main, argc=3, 
argv=0x7c94, init=0x80656c4 _init, fini=0x81395ac _fini, 
rtld_fini=0x2aab5ea0 _dl_fini, stack_end=0x7c8c)
at ../sysdeps/generic/libc-start.c:92

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-26 Thread Tatsuo Ishii

 Tatsuo Ishii wrote:
  
Tatsuo Ishii [EMAIL PROTECTED] writes
 How can I check it?

 The 'stuck' message should at least give you a code location...
   
 FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
   
Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
(and not, say, a buffer spinlock).  You could try adding some
debug logging here, although the output would be voluminous.
But what would really be useful is a stack trace for the stuck
process.  Consider changing the s_lock code to abort() when it
gets a stuck spinlock --- then you could gdb the coredump.
  
   Nice idea. I will try that.
  
  It appeared that the deadlock checking timer seems to be the source of
  the problem. With the default settings, it checks deadlocks every 1
  second PER backend. 
 
 IIRC deadlock check was called only once per backend.

In my understanding the deadlock check is performed every time the
backend aquires lock. Once the it aquires, it kill the timer. However,
under heavy transactions such as pgbench generates, chances are that
the checking fires, and it tries to aquire a spin lock. That seems the
situation.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-26 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 It appeared that the deadlock checking timer seems to be the source of
 the problem. With the default settings, it checks deadlocks every 1
 second PER backend.

I don't believe it.  setitimer with it_interval = 0 should produce one
interrupt, no more.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-26 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 In my understanding the deadlock check is performed every time the
 backend aquires lock. Once the it aquires, it kill the timer. However,
 under heavy transactions such as pgbench generates, chances are that
 the checking fires, and it tries to aquire a spin lock. That seems the
 situation.

It could be that with ~1000 backends all waiting for the same lock, the
deadlock-checking code just plain takes too long to run.  It might have
an O(N^2) or worse behavior in the length of the queue; I don't think
the code was ever analyzed for such problems.

Do you want to try adding some instrumentation to HandleDeadlock to see
how long it runs on each call?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-24 Thread Tatsuo Ishii

   FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
  
  Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
  (and not, say, a buffer spinlock).  You could try adding some
  debug logging here, although the output would be voluminous.
  But what would really be useful is a stack trace for the stuck
  process.  Consider changing the s_lock code to abort() when it
  gets a stuck spinlock --- then you could gdb the coredump.
 
 Nice idea. I will try that.

I got an interesting result. If I compile backend with -g (and without
-O2), I get no stuck spin lock errors. However, if s_lock.c is
compiled with -O2 enabled, I got the error again. It seems only
s_lock.c is related to this phenomenon.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-24 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 I got an interesting result. If I compile backend with -g (and without
 -O2), I get no stuck spin lock errors. However, if s_lock.c is
 compiled with -O2 enabled, I got the error again. It seems only
 s_lock.c is related to this phenomenon.

That's very interesting.  Could optimization be breaking the TAS
sequence on your platform?  What is your platform, anyway?
Might need to burrow into the assembly code to see just what's
happening.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-24 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  I got an interesting result. If I compile backend with -g (and without
  -O2), I get no stuck spin lock errors. However, if s_lock.c is
  compiled with -O2 enabled, I got the error again. It seems only
  s_lock.c is related to this phenomenon.
 
 That's very interesting.  Could optimization be breaking the TAS
 sequence on your platform?  What is your platform, anyway?
 Might need to burrow into the assembly code to see just what's
 happening.

As I said, it's a x86 Linux (more precisely, kernel 2.2.18 with 2
processors, egcs 2.91). I suspect that the inlined TAS code might be
incompatible with the caller, s_lock.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-21 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  I have seen problems with extremely many concurrent users.
  I run pgbench:
 
  pgbench -c 1000 -t 1 test
 
  And I get stuck spin lock errors. This is 100% reproducable (i.e. I
  have nerver succeeded in pgbench -c 1000).
 
 Is it actually stuck, or just timing out due to huge contention?
 You could try increasing the timeout intervals in s_lock.c to
 make sure.  

I believe it's an actual stuck. From s_lock.c:

#define DEFAULT_TIMEOUT (100*100)   /* default timeout: 100 sec */

So even if there are 1000 contentions, 100 sec should be enough (100
msec for each backend).

 If it is stuck, on which lock(s)?

How can I check it? In that situation, it's very hard to attacth a
debugger to the backend process. 1000 backends consum all CPU time.
--
Tatsuo Ishii


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-21 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 If it is stuck, on which lock(s)?

 How can I check it?

The 'stuck' message should at least give you a code location...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-21 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  If it is stuck, on which lock(s)?
 
  How can I check it?
 
 The 'stuck' message should at least give you a code location...

Here is the actual message:

FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.

Last several queries before stuck spinlock are:

DEBUG:  query: update branches set bbalance = bbalance + 436 where bid = 1
DEBUG:  query: update tellers set tbalance = tbalance + 230 where tid = 17

DEBUG:  query: update tellers set tbalance = tbalance + 740 where tid = 7

DEBUG:  query: update tellers set tbalance = tbalance + 243 where tid = 13

DEBUG:  query: select abalance from accounts where aid = 177962
DEBUG:  query: update tellers set tbalance = tbalance + 595 where tid = 18

DEBUG:  query: update branches set bbalance = bbalance + 595 where bid = 1
DEBUG:  query: update tellers set tbalance = tbalance + 252 where tid = 15

I'm trying now is increasing the timeout to 10 times longer. Will
report in next email...
--
Tatsuo Ishii

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-21 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 How can I check it?
 
 The 'stuck' message should at least give you a code location...

 FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.

Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
(and not, say, a buffer spinlock).  You could try adding some
debug logging here, although the output would be voluminous.
But what would really be useful is a stack trace for the stuck
process.  Consider changing the s_lock code to abort() when it
gets a stuck spinlock --- then you could gdb the coredump.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-21 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  How can I check it?
  
  The 'stuck' message should at least give you a code location...
 
  FATAL: s_lock(0x2ac2d016) at spin.c:158, stuck spinlock. Aborting.
 
 Hmm, that's SpinAcquire, so it's one of the predefined spinlocks
 (and not, say, a buffer spinlock).  You could try adding some
 debug logging here, although the output would be voluminous.
 But what would really be useful is a stack trace for the stuck
 process.  Consider changing the s_lock code to abort() when it
 gets a stuck spinlock --- then you could gdb the coredump.

Nice idea. I will try that.
--
Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] stuck spin lock with many concurrent users

2001-06-20 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 I have seen problems with extremely many concurrent users.
 I run pgbench:

 pgbench -c 1000 -t 1 test

 And I get stuck spin lock errors. This is 100% reproducable (i.e. I
 have nerver succeeded in pgbench -c 1000).

Is it actually stuck, or just timing out due to huge contention?
You could try increasing the timeout intervals in s_lock.c to
make sure.  If it is stuck, on which lock(s)?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl